Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator: rewrite move region related functions #1667

Merged
merged 32 commits into from Aug 22, 2019

Conversation

Luffbee
Copy link
Contributor

@Luffbee Luffbee commented Aug 6, 2019

What problem does this PR solve?

  • Make logic of matchPeerSteps much more clear.
  • Fix bug: CreateMoveRegionOperator and matchPeerSteps don't check RejectLeader label.
  • Unify order of parameters and return values.

What is changed and how it works?

  • Abstract moveRegionSteps and reuse it in matchPeerSteps.
  • Abstract transferLeaderToAnySteps to select new leader.

Check List

Tests

  • Unit test

@disksing
Copy link
Contributor

disksing commented Aug 7, 2019

@Connor1996 PTAL

Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add and remove peer one by one. And also make sure the leader is the last to be removed.

@Luffbee
Copy link
Contributor Author

Luffbee commented Aug 7, 2019

/rebuild

@Luffbee
Copy link
Contributor Author

Luffbee commented Aug 7, 2019

@Connor1996 PTAL

server/schedule/operator/operator.go Outdated Show resolved Hide resolved
server/schedule/operator/operator.go Outdated Show resolved Hide resolved

var steps = make([]OpStep, 0, len(addPeerSteps)*2+len(rmPeerSteps)+len(tlSteps))
i, j := 0, 0
for ; i < len(addPeerSteps) && j < len(rmPeerSteps); i, j = i+1, j+1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments to illustrate why we want to generate steps like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not very clear about why should we add and remove peers one by one. Could you give some reasons? I will add them to the comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not clear like before.

server/schedule/operator/operator.go Outdated Show resolved Hide resolved
}

// transferLeaderToAnySteps returns the first suitable store to become region leader,
func transferLeaderToAnySteps(leaderID uint64, storeIDs []uint64, cluster Cluster) (OpKind, []OpStep) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just returning OpStep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we regard this function as a black box, we should not assume it will only return a TransferLeader step if succeeded.

What's more, I'd like to make *Steps functions have similar returns. Now there are 3 such functions that have different returns:

  • transferLeaderToAnySteps(this function): no error.
  • CreateAddPeerSteps and CreateAddLightPeerSteps: no OpKind and error.

I will add an error for this function, because it may fail.
The other two functions are exported, so I didn't touch them.

server/schedule/operator/operator.go Outdated Show resolved Hide resolved
server/schedule/operator/operator.go Outdated Show resolved Hide resolved

var steps = make([]OpStep, 0, len(addPeerSteps)*2+len(rmPeerSteps)+len(tlSteps))
i, j := 0, 0
for ; i < len(addPeerSteps) && j < len(rmPeerSteps); i, j = i+1, j+1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not clear like before.

@Luffbee
Copy link
Contributor Author

Luffbee commented Aug 8, 2019

@Connor1996 @rleungx PTAL

@Luffbee Luffbee changed the title Rewrite some functions in 'schedule/operator'. operator: rewrite move region related functions Aug 8, 2019
server/schedule/operator/operator.go Outdated Show resolved Hide resolved
server/schedule/operator/operator.go Outdated Show resolved Hide resolved
server/schedule/operator/operator.go Outdated Show resolved Hide resolved
Luffbee and others added 3 commits August 9, 2019 10:52
Co-Authored-By: Ryan Leung <rleungx@gmail.com>
Co-Authored-By: Ryan Leung <rleungx@gmail.com>
@Luffbee
Copy link
Contributor Author

Luffbee commented Aug 9, 2019

/rebuild

@codecov-io
Copy link

codecov-io commented Aug 9, 2019

Codecov Report

Merging #1667 into master will decrease coverage by 0.01%.
The diff coverage is 82.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1667      +/-   ##
==========================================
- Coverage   76.69%   76.68%   -0.02%     
==========================================
  Files         157      157              
  Lines       15489    15485       -4     
==========================================
- Hits        11880    11875       -5     
  Misses       2593     2593              
- Partials     1016     1017       +1
Impacted Files Coverage Δ
server/checker/merge_checker.go 77.04% <0%> (-1.29%) ⬇️
server/schedule/operator/operator.go 87.25% <83.67%> (-0.13%) ⬇️
server/kv/etcd_kv.go 70.12% <0%> (-9.1%) ⬇️
server/schedulers/shuffle_hot_region.go 58.97% <0%> (-6.42%) ⬇️
server/schedulers/random_merge.go 61.53% <0%> (-5.13%) ⬇️
server/region_syncer/client.go 81.01% <0%> (-2.54%) ⬇️
server/tso/tso.go 77.35% <0%> (-1.89%) ⬇️
server/grpc_service.go 58.09% <0%> (-0.44%) ⬇️
client/client.go 68.38% <0%> (+0.2%) ⬆️
server/cluster.go 82.76% <0%> (+0.25%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 561d004...645ccfb. Read the comment docs.

@Luffbee
Copy link
Contributor Author

Luffbee commented Aug 13, 2019

/rebuild

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM.

server/schedule/operator/operator.go Outdated Show resolved Hide resolved
server/schedule/operator/operator.go Outdated Show resolved Hide resolved
server/schedule/operator/operator.go Outdated Show resolved Hide resolved
@Luffbee
Copy link
Contributor Author

Luffbee commented Aug 16, 2019

@Connor1996 @nolouch @rleungx PTAL

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nolouch nolouch added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 16, 2019
// c == [opA1, opA2, opB1, opA3, opB2, opA4, opA5, opA6, opB3, opB4, opB5, opB6]
//
// sizeHint is a hint for the length of returned slice.
func interleaveStepGroups(a, b [][]OpStep, sizeHint int) []OpStep {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this hint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hint is just like the second and third arguments in make(), for performance, not necessary.

@Luffbee
Copy link
Contributor Author

Luffbee commented Aug 20, 2019

/rebuild

1 similar comment
@Luffbee
Copy link
Contributor Author

Luffbee commented Aug 20, 2019

/rebuild

@Luffbee
Copy link
Contributor Author

Luffbee commented Aug 20, 2019

/rebuild

@rleungx
Copy link
Member

rleungx commented Aug 20, 2019

@Connor1996 PTAL.

Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Luffbee Luffbee added the status/can-merge Indicates a PR has been approved by a committer. label Aug 22, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Aug 22, 2019

/run-all-tests

@sre-bot sre-bot merged commit 84f6a82 into tikv:master Aug 22, 2019
Luffbee pushed a commit that referenced this pull request Aug 27, 2019
* *: unify get store function everywhere (#1671)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

*  server: use leader lease to determine tso service validity (#1676)

Signed-off-by: disksing <i@disksing.com>

* test: fix tests (#1696)

* test: fix region syncer test

Signed-off-by: disksing <i@disksing.com>

* add config-check flag for pd-server (#1695)

Signed-off-by: cwen0 <cwenyin0@gmail.com>

* operator: rewrite move region related functions (#1667)

* *: support setting endKey for ScanRange (#1700)

Signed-off-by: disksing <i@disksing.com>

* *: reduce some unnecessary parameters (#1698)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* schedule: Do not send an operator of a region wth a stale epoch (#1659)

* schedule: Do not send an operator of a region wth a stale epoch

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* schedule: check the version changed by the operator self

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* schedule: fix unit test

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* schedule: fix to avoid dispatching a stale opstep

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* dispatch: refactor "ConsumeConfVer() int" to "ExpectConfVerChange() bool"

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* dispatch: fix typo in comment

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* fix typo

Co-Authored-By: Ryan Leung <rleungx@gmail.com>

* dispatch: fix unittest

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* dispatch: refine format

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* server: fix the dead lock in scatter region (#1706)

Signed-off-by: Ryan Leung <rleungx@gmail.com>
Luffbee added a commit that referenced this pull request Sep 9, 2019
* *: unify get store function everywhere (#1671)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* remove unnecessary parentheses

*  server: use leader lease to determine tso service validity (#1676)

Signed-off-by: disksing <i@disksing.com>

* change internal stat values to float64

* add pending operator influence

* add metrics of pending influence

* fix metrics

* fix panic

* adjust pending influence of balanceHotWrite

* change weight of pending influence

* test: fix tests (#1696)

* test: fix region syncer test

Signed-off-by: disksing <i@disksing.com>

* decrease region rolling window; store pending influence in scheduler

* add config-check flag for pd-server (#1695)

Signed-off-by: cwen0 <cwenyin0@gmail.com>

* decrease possiblility transfer hot write leader

* change pending influence weight

* add unstarted op metrics

* add logs for debug

* add log for debug

* add logs for debug

* add logs for debug

* add logs for debug

* add logs for debug

* add logs for debug

* add logs for debug

* Revert "add logs for debug"

This reverts commit e74c7a9.

* add metrics for hotspot operators

* operator: rewrite move region related functions (#1667)

* add metrics for pending operators

* *: support setting endKey for ScanRange (#1700)

Signed-off-by: disksing <i@disksing.com>

* fix bug

* fix bug

* fix bug

* fix metrics thread-safe bug

* fix logic bug

* *: reduce some unnecessary parameters (#1698)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* schedule: Do not send an operator of a region wth a stale epoch (#1659)

* schedule: Do not send an operator of a region wth a stale epoch

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* schedule: check the version changed by the operator self

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* schedule: fix unit test

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* schedule: fix to avoid dispatching a stale opstep

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* dispatch: refactor "ConsumeConfVer() int" to "ExpectConfVerChange() bool"

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* dispatch: fix typo in comment

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* fix typo

Co-Authored-By: Ryan Leung <rleungx@gmail.com>

* dispatch: fix unittest

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* dispatch: refine format

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* server: fix the dead lock in scatter region (#1706)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* add drop time for operator

* use IsDropped to recognize canceled ops

* try to fix trans leader burst

* try to fix trans leader burst

* add zombie influence

* change select src dst strategy; improve op_controller

* change select src strategy

* fix bug

* tools: fix set namespace in pd-ctl (#1701)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* tools: fix parse url without http prefix (#1703)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* tests: support deadlock detection in make test (#1704)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* Makefile: fix failpoint enable (#1722)

Signed-off-by: nolouch <nolouch@gmail.com>

* checker: fix the issue that a region does not merge to the sibling with smaller size (#1723)

Signed-off-by: disksing <i@disksing.com>

* tools: balance region simulator (#1708)

* scheduler: do not remove the operator when the step does not finish (#1715)

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* operator: fix the AddLearner config version judgment (#1732)

Signed-off-by: nolouch <nolouch@gmail.com>

* tools: fix TLS in pd control (#1729)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* syncer: support TLS for region syncer (#1728)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* schedule: fix a thread-safe bug and improve code (#1719)
Luffbee added a commit that referenced this pull request Sep 11, 2019
* *: unify get store function everywhere (#1671)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

*  server: use leader lease to determine tso service validity (#1676)

Signed-off-by: disksing <i@disksing.com>

* test: fix tests (#1696)

* test: fix region syncer test

Signed-off-by: disksing <i@disksing.com>

* add config-check flag for pd-server (#1695)

Signed-off-by: cwen0 <cwenyin0@gmail.com>

* operator: rewrite move region related functions (#1667)

* *: support setting endKey for ScanRange (#1700)

Signed-off-by: disksing <i@disksing.com>

* *: reduce some unnecessary parameters (#1698)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* schedule: Do not send an operator of a region wth a stale epoch (#1659)

* schedule: Do not send an operator of a region wth a stale epoch

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* schedule: check the version changed by the operator self

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* schedule: fix unit test

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* schedule: fix to avoid dispatching a stale opstep

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* dispatch: refactor "ConsumeConfVer() int" to "ExpectConfVerChange() bool"

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* dispatch: fix typo in comment

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* fix typo

Co-Authored-By: Ryan Leung <rleungx@gmail.com>

* dispatch: fix unittest

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* dispatch: refine format

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* server: fix the dead lock in scatter region (#1706)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* tools: fix set namespace in pd-ctl (#1701)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* tools: fix parse url without http prefix (#1703)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* tests: support deadlock detection in make test (#1704)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* Makefile: fix failpoint enable (#1722)

Signed-off-by: nolouch <nolouch@gmail.com>

* checker: fix the issue that a region does not merge to the sibling with smaller size (#1723)

Signed-off-by: disksing <i@disksing.com>

* tools: balance region simulator (#1708)

* scheduler: do not remove the operator when the step does not finish (#1715)

Signed-off-by: Shafreeck Sea <shafreeck@gmail.com>

* operator: fix the AddLearner config version judgment (#1732)

Signed-off-by: nolouch <nolouch@gmail.com>

* tools: fix TLS in pd control (#1729)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* syncer: support TLS for region syncer (#1728)

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* schedule: fix a thread-safe bug and improve code (#1719)

* statistics: fix region flow calculation (#1688)

Signed-off-by: jiyingtk <jiyingtk@mail.ustc.edu.cn>

* makefile: improve deadlock-enable/disable (#1736)

* api: fix missing keys statistic in region information (#1741)

Signed-off-by: nolouch <nolouch@gmail.com>

* *: update go version to 1.13 (#1742)

Signed-off-by: disksing <i@disksing.com>

* coordinator: add the operator cost time in log field (#1748)

Signed-off-by: nolouch <nolouch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants