Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add follower read support to TiDB #11347

Merged
merged 24 commits into from
Aug 16, 2019
Merged

Add follower read support to TiDB #11347

merged 24 commits into from
Aug 16, 2019

Conversation

sunxiaoguang
Copy link
Contributor

Signed-off-by: Xiaoguang Sun sunxiaoguang@zhihu.com

What problem does this PR solve?

Add replica read support to TiDB. Users can use tidb_replica_read session variable to choose reading from leader or follower. To make it consistent with existing behavior, leader will be used by default unless follower is explicitly specified otherwise.

What is changed and how it works?

Add a session scope variable tidb_replica_read to specify if TiDB should read data from leader or follower.

Check List

Tests

  • Unit test
    Will add later
  • Manual test (add detailed scripts or steps below)
    Will add later

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change

Side effects
NA

Related changes

  • Need to update the documentation
  • Need to be included in the release note

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
@shenli shenli added the contribution This PR is from a community contributor. label Jul 21, 2019
kv/kv.go Outdated

const (
// ReplicaReadLeader stands for 'read from leader'.
ReplicaReadLeader ReplicaReadType = iota
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer using 1 << iota here, so we can use bit operations to easily check the type is set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though about this initially but doubt if it is rational to read from all kind of replicas. Read from different type of replicas may have different latency characteristics and pose different burden to leader, we can have more discussion about this.

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
@codecov
Copy link

codecov bot commented Jul 23, 2019

Codecov Report

Merging #11347 into master will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##            master    #11347   +/-   ##
=========================================
  Coverage   81.727%   81.727%           
=========================================
  Files          434       434           
  Lines        94878     94878           
=========================================
  Hits         77541     77541           
  Misses       11879     11879           
  Partials      5458      5458

sunxiaoguang and others added 5 commits July 23, 2019 11:50
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
…_read

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
sessionctx/variable/session.go Show resolved Hide resolved
store/tikv/region_cache.go Outdated Show resolved Hide resolved
store/tikv/scan.go Outdated Show resolved Hide resolved
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
@sre-bot
Copy link
Contributor

sre-bot commented Jul 31, 2019

Hi contributor, thanks for your PR.

This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.

sunxiaoguang and others added 4 commits August 1, 2019 10:31
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
@sunxiaoguang sunxiaoguang changed the title [DNM] Initial effort to add follower read to TiDB Add follower read support to TiDB Aug 2, 2019
@sunxiaoguang
Copy link
Contributor Author

I have made some changes according to what we agreed in last discussion. Please take a look, thanks. @overvenus @5kbpers

session/session.go Outdated Show resolved Hide resolved
store/tikv/kv.go Show resolved Hide resolved
@@ -973,6 +1016,7 @@ func (c *RegionCache) switchNextPeer(r *Region, currentPeerIdx int, err error) {
nextIdx := (currentPeerIdx + 1) % len(rs.stores)
newRegionStore := rs.clone()
newRegionStore.workStoreIdx = int32(nextIdx)
newRegionStore.initFollowers()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When switchNextPeer is called, the workStoreIdx may point to a follower, so initFollowers does not do what it supposed to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it trying to predict that the next peer is going to be the new leader? If that's the case workStoreIdx will actually become new leader to make non replica read work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we failed a request, we change the workStoreIdx, then access the follower to wake up the hibernated region.

It's not always the leader.

If the workStoreIdx points to a follower, it may be the only valid follower, as we avoid to access it, all requests will be sent to the real leader.

It works but the code it's misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. Looks like simply removing initFollowers here will work, am I right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -70,6 +71,7 @@ type RegionStore struct {
workStoreIdx int32 // point to current work peer in meta.Peers and work store in stores(same idx)
stores []*Store // stores in this region
storeFails []uint32 // snapshots of store's fail, need reload when `storeFails[curr] != stores[cur].fail`
followers []int32 // followers' index in this region
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use a leader index instead.
When we trying to find a follower, we can just skip the leader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will have to check current store index every single time that way. Wouldn't it be even more cumbersome?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The followers don't have more information, we can remove it to save memory.
And the computation cost to avoid workStoreIdx is very little.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using seed passed in here. Which one are we going to use when it's pointing to leader? Whichever follower it chooses as a fallback, how to make sure it can really balance the load? In addition, arrays for a million regions only cost dozens of MB. Compare to other places, the footprint of this array is actually negligible.

Copy link
Member

@coocood coocood Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numPeers := 5
followerIdx := seed % (numPeers-1)
if followerIdx >= workStoreIdx {
    followerIdx++
    if followerIdx == numPeers {
        followerIdx = 0
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant, let's do it this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, followerIdx == numPeers will always be false.

This will do

numPeers := 5
followerIdx := seed % (numPeers-1)
if followerIdx >= workStoreIdx {
    followerIdx++
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sunxiaoguang and others added 3 commits August 5, 2019 20:42
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Copy link
Member

@coocood coocood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a follower store failed, the region info will still have that failed store, it will not be removed unless we manually remove it or wait tens of minutes for PD to schedule add another follower to the region and remove the failed one.

So we will keep requesting the failed store for follower read.

Another problem is that the region cache has an assumption that the failed store is leader, if we failed to request a follower, the region cache will switch the workStoreIdx and drop region cache which makes the leader read fail.

@sunxiaoguang
Copy link
Contributor Author

If a follower store failed, the region info will still have that failed store, it will not be removed unless we manually remove it or wait tens of minutes for PD to schedule add another follower to the region and remove the failed one.

So we will keep requesting the failed store for follower read.

Another problem is that the region cache has an assumption that the failed store is leader, if we failed to request a follower, the region cache will switch the workStoreIdx and drop region cache which makes the leader read fail.

For the first issue, it seems the old way of storing followers array makes better sense. When a follower is failed, it can be removed from valid followers array. Without storing such array, every single time trying to get a follower we need to check storeFails to skip it.

About the next issue, we can skip switching peer if current failure is caused by follower read If I understand it correctly.

@coocood
Copy link
Member

coocood commented Aug 6, 2019

@sunxiaoguang
A failed store may resume soon, if we remove it in the array, we may have no follower to use.

If storeFail doesn't match for a follower read, we should choose another follower and avoid invalidate the cache.

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
@sunxiaoguang
Copy link
Contributor Author

sunxiaoguang commented Aug 6, 2019

@sunxiaoguang
A failed store may resume soon, if we remove it in the array, we may have no follower to use.

If storeFail doesn't match for a follower read, we should choose another follower and avoid invalidate the cache.

I made some changes to try different follower if selected one had failed. Please take a look, thanks.

@coocood
Copy link
Member

coocood commented Aug 7, 2019

@sunxiaoguang
If we remove the || in the unit tests, the logic of picking the valid follower can be covered.

go.sum Show resolved Hide resolved
@@ -94,8 +94,14 @@ func (s *RegionRequestSender) SendReqCtx(bo *Backoffer, req *tikvrpc.Request, re
}
})

var replicaRead kv.ReplicaReadType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding a metric for ReplicaReadType to let follower read time be observable in granfana

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me add it.

Copy link
Contributor Author

@sunxiaoguang sunxiaoguang Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to add a new label replica_type to those duration metrics. But I'm not sure if is the right way to do it. Any suggestions? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like many metrics are populated with specific label statically in code. Adding a new label to it would make it harder to do so.

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
@@ -268,7 +268,7 @@ func (s *tikvStore) Begin() (kv.Transaction, error) {

// BeginWithStartTS begins a transaction with startTS.
func (s *tikvStore) BeginWithStartTS(startTS uint64) (kv.Transaction, error) {
txn, err := newTikvTxnWithStartTS(s, startTS)
txn, err := newTikvTxnWithStartTS(s, startTS, s.nextReplicaReadSeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will make ReadSeed be changed for each txn in same session, it seem conflict to #11347 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it seems to be complicated to make it consistent over txn and coprocessor. And the whole discussion would be more efficient over instant messaging, therefore we had some discussion on WeChat group and agreed that we can use different policy for coprocessor and txn. Sorry for not giving out the context and clue about it here.

@@ -277,7 +277,7 @@ func (s *tikvStore) BeginWithStartTS(startTS uint64) (kv.Transaction, error) {
}

func (s *tikvStore) GetSnapshot(ver kv.Version) (kv.Snapshot, error) {
snapshot := newTiKVSnapshot(s, ver)
snapshot := newTiKVSnapshot(s, ver, s.nextReplicaReadSeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@lysu
Copy link
Contributor

lysu commented Aug 16, 2019

LGTM

@coocood coocood added status/LGT2 Indicates that a PR has LGTM 2. status/can-merge Indicates a PR has been approved by a committer. labels Aug 16, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Aug 16, 2019

/run-all-tests

@sre-bot sre-bot merged commit 523b936 into pingcap:master Aug 16, 2019
@sunxiaoguang sunxiaoguang deleted the replica_read branch August 16, 2019 05:35
coocood pushed a commit to coocood/tidb that referenced this pull request Nov 14, 2019
cherry pick pingcap#11347 to 3.1

Fix code conflicts from 4.0 to 3.1. Users can use tidb_replica_read session variable to choose reading from leader or follower. To make it consistent with existing behavior, leader will be used by default unless follower is explicitly specified otherwise.

Add a session scope variable tidb_replica_read to specify if TiDB should read data from leader or follower.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv contribution This PR is from a community contributor. release-note status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/new-feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants