Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: global kill 32bits (local connID part) #25385

Merged
merged 57 commits into from
Jun 6, 2023

Conversation

pingyu
Copy link
Contributor

@pingyu pingyu commented Jun 13, 2021

What problem does this PR solve?

Issue Number: ref #8854
(32bits local connection IDs allocation part ONLY. Server id part will be in the next PR.)

Problem Summary:
Support CTRL-C or kill to kill a connection/query by implementing global connection IDs.

What is changed and how it works?

What's Changed:

  • Extract connection ID allocating logics (as ConnectionIDAllocator) from original GlobalConnID, to improve codes organization.
  • Modify global connection ID structure to support 32bits, according to design doc.
  • Implement a lock-free ring buffer to support 32bits local connection ID allocation.

How it Works:

  • Lock-free ring buffer: Referring to ring buffer implementation from Go-datastructures, simplify codes to support uint32 values, and adjust sequence comparison to improve performance.

Related changes

Check List

Tests

  • Unit test
    Verify correctness of lock-free ring buffer by Go's race detector and long-time pressure test.
$ go test -race -check.f testGlobalConnIDSuite
PASS: global_conn_id_test.go:375: testGlobalConnIDSuite.TestLockBasedPoolConcurrencySafety	2.235s
PASS: global_conn_id_test.go:85: testGlobalConnIDSuite.TestLockFreePoolBasic	0.001s
PASS: global_conn_id_test.go:352: testGlobalConnIDSuite.TestLockFreePoolBasicConcurrencySafety	12.022s
PASS: global_conn_id_test.go:405: testGlobalConnIDSuite.TestLockFreePoolConcurrencySafety	308.628s
PASS: global_conn_id_test.go:129: testGlobalConnIDSuite.TestLockFreePoolInitEmpty	0.001s
PASS: global_conn_id_test.go:40: testGlobalConnIDSuite.TestParse	0.000s
OK: 6 passed
PASS
ok  	github.com/pingcap/tidb/util	323.805s
$
$ go test -check.f TestLockFreePoolConcurrencySafety -count 100
PASS: global_conn_id_test.go:405: testGlobalConnIDSuite.TestLockFreePoolConcurrencySafety	0.787s
OK: 1 passed
PASS: global_conn_id_test.go:405: testGlobalConnIDSuite.TestLockFreePoolConcurrencySafety	0.783s
OK: 1 passed
PASS: global_conn_id_test.go:405: testGlobalConnIDSuite.TestLockFreePoolConcurrencySafety	0.775s
OK: 1 passed
PASS: global_conn_id_test.go:405: testGlobalConnIDSuite.TestLockFreePoolConcurrencySafety	0.754s
OK: 1 passed
PASS: global_conn_id_test.go:405: testGlobalConnIDSuite.TestLockFreePoolConcurrencySafety	0.781s
OK: 1 passed
...
...
PASS: global_conn_id_test.go:405: testGlobalConnIDSuite.TestLockFreePoolConcurrencySafety	1.122s
OK: 1 passed
PASS
ok  	github.com/pingcap/tidb/util	109.635s
  • Integration test
    By tests/globalkilltest

  • Benchmark

    • Benchmark between lock-based and lock-free ring buffer. About 17% improvement on 10000 producers + 10000 consumers.

image
image

) go test -v -benchmem -run="BenchmarkPoolConcurrency" -bench="BenchmarkPoolConcurrency"
goos: darwin
goarch: amd64
pkg: github.com/pingcap/tidb/util
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkPoolConcurrency
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_1:1
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_1:1-12         	      91	  12994975 ns/op	      16 B/op	       0 allocs/op
BenchmarkPoolConcurrency/LockFreePool:_P:C:_1:1
BenchmarkPoolConcurrency/LockFreePool:_P:C:_1:1-12          	      48	  23977503 ns/op	       0 B/op	       0 allocs/op
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_3:3
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_3:3-12         	      42	  25349584 ns/op	     153 B/op	       1 allocs/op
BenchmarkPoolConcurrency/LockFreePool:_P:C:_3:3
BenchmarkPoolConcurrency/LockFreePool:_P:C:_3:3-12          	      27	  43398540 ns/op	       0 B/op	       0 allocs/op
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_10:10
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_10:10-12       	      30	  38976866 ns/op	     454 B/op	       4 allocs/op
BenchmarkPoolConcurrency/LockFreePool:_P:C:_10:10
BenchmarkPoolConcurrency/LockFreePool:_P:C:_10:10-12        	      19	  62496581 ns/op	       5 B/op	       0 allocs/op
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_100:100
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_100:100-12     	      12	  87424306 ns/op	    2032 B/op	      21 allocs/op
BenchmarkPoolConcurrency/LockFreePool:_P:C:_100:100
BenchmarkPoolConcurrency/LockFreePool:_P:C:_100:100-12      	      21	  60509495 ns/op	       0 B/op	       0 allocs/op
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_1000:1000
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_1000:1000-12   	      12	  86285176 ns/op	    9312 B/op	      97 allocs/op
BenchmarkPoolConcurrency/LockFreePool:_P:C:_1000:1000
BenchmarkPoolConcurrency/LockFreePool:_P:C:_1000:1000-12    	      16	  78761927 ns/op	     120 B/op	       1 allocs/op
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_10000:10000
BenchmarkPoolConcurrency/LockBasedPool:_P:C:_10000:10000-12 	      10	 105612144 ns/op	   10531 B/op	     109 allocs/op
BenchmarkPoolConcurrency/LockFreePool:_P:C:_10000:10000
BenchmarkPoolConcurrency/LockFreePool:_P:C:_10000:10000-12  	      14	  87560939 ns/op	    2907 B/op	      30 allocs/op
PASS
ok  	github.com/pingcap/tidb/util	17.119s
  • Benchmark among allocators. About 14% improvement on 10000 concurrency.

image
image

$ go test -v -benchmem -run="BenchmarkLocalConnIDAllocator" -bench="BenchmarkLocalConnIDAllocator"
goos: darwin
goarch: amd64
pkg: github.com/pingcap/tidb/util
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkLocalConnIDAllocator
BenchmarkLocalConnIDAllocator/Allocator_64_x1
BenchmarkLocalConnIDAllocator/Allocator_64_x1-12         	38799840	        37.77 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x1
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x1-12         	 9340762	       184.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/LockFreePool_x1
BenchmarkLocalConnIDAllocator/LockFreePool_x1-12                    	 5476623	       223.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_64_x3
BenchmarkLocalConnIDAllocator/Allocator_64_x3-12                    	33288285	        36.03 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x3
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x3-12         	 7140640	       154.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/LockFreePool_x3
BenchmarkLocalConnIDAllocator/LockFreePool_x3-12                    	 5480599	       226.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_64_x10
BenchmarkLocalConnIDAllocator/Allocator_64_x10-12                   	26264768	        39.06 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x10
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x10-12        	 7209702	       164.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/LockFreePool_x10
BenchmarkLocalConnIDAllocator/LockFreePool_x10-12                   	 4981665	       242.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_64_x100
BenchmarkLocalConnIDAllocator/Allocator_64_x100-12                  	27907047	        49.07 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x100
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x100-12       	 3936932	       302.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/LockFreePool_x100
BenchmarkLocalConnIDAllocator/LockFreePool_x100-12                  	 4827620	       279.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_64_x1000
BenchmarkLocalConnIDAllocator/Allocator_64_x1000-12                 	24442515	        43.07 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x1000
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x1000-12      	 3673219	       317.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/LockFreePool_x1000
BenchmarkLocalConnIDAllocator/LockFreePool_x1000-12                 	 4626400	       289.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_64_x10000
BenchmarkLocalConnIDAllocator/Allocator_64_x10000-12                	23900415	        53.18 ns/op	       0 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x10000
BenchmarkLocalConnIDAllocator/Allocator_32(LockBased)_x10000-12     	 3269419	       337.7 ns/op	       1 B/op	       0 allocs/op
BenchmarkLocalConnIDAllocator/LockFreePool_x10000
BenchmarkLocalConnIDAllocator/LockFreePool_x10000-12                	 4137858	       290.7 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/pingcap/tidb/util	27.046s

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note

Support KILL (32 bits) across the whole cluster.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jun 13, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • breezewish

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@pingyu pingyu requested a review from a team as a code owner June 13, 2021 15:51
@pingyu pingyu requested review from XuHuaiyu and removed request for a team June 13, 2021 15:51
@ti-chi-bot ti-chi-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 13, 2021
@pingyu
Copy link
Contributor Author

pingyu commented Jun 13, 2021

/cc @SunRunAway @breeswish

@github-actions github-actions bot added sig/execution SIG execution sig/sql-infra SIG: SQL Infra labels Jun 13, 2021
@SunRunAway
Copy link
Contributor

@pingyu I'll take a look by the end of this weekend.

Copy link
Member

@breezewish breezewish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some encapsulation related comments :)

}

// LocalConnIDAllocator32 is local connID allocator for 32bits global connection ID.
type LocalConnIDAllocator32 struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalPoolIDAllocator? (which is an ID allocator allocates ID from a local pool and returns the ID to the local pool when deallocates)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to LockFreeCircularPool. Please see this comment.

return localConnID
}
}
panic(fmt.Sprintf("Failed to allocate 64bits local connID after retry %v times. Should never happen", LocalConnIDAllocator64RetryCount))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be never happen (from the perspective of the allocator itself, since a caller can set a relatively small ID range). How about throwing errors and let caller to decide what to do when allocate failed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. PTAL~

)

// SimpleConnIDAllocator is a simple auto-increment allocator.
type SimpleConnIDAllocator struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalAutoIncIDAllocator? (which is an ID allocator that simply do auto-increment to allocate ID and will never fail. Wrapping will happen. After that, the uniqueness of the ID will no longer be ensured.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep this SimpleConnIDAllocator. Please see this comment.


// Init initiates LocalConnIDAllocator64
func (a *LocalConnIDAllocator64) Init(existedChecker connectionIDExistCheckerFn) {
a.existedChecker = existedChecker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about maintaining the exist checker inside the allocator, considering that it provides alloc and dealloc so that it can fully know whether ID exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. PTAL~

}

// LocalConnIDAllocator64 is local connID allocator for 64bits global connection ID.
type LocalConnIDAllocator64 struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalRandomIDAllocator? (which is an ID allocator that allocates ID using random probes in the ID range, fails after several attempts)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to AutoIncPool, as the implementation is actually auto-increment. Please see this comment.

@XuHuaiyu XuHuaiyu removed their request for review June 25, 2021 06:19
@pingyu
Copy link
Contributor Author

pingyu commented Jul 2, 2021

Be inspired by @breeswish , I separate objects into two roles. One is ConnectionIDAllocator, which is used by Server object to get next connection ID. The other is IDPool, which is the underline ID pool used by ConnectionIDAllocator.

ConnectionIDAllocator specialize as SimpleConnIDAllocator and GlobalConnIDAllocator, used when GlobalKill feature is disable and enable, respectively.

IDPool specialize as AutoIncPool to offer auto-increment ID pool for SimpleConnIDAllocator and 64bits global connection ID allocation in GlobalConnIDAllocator, while LockFreeCircularPool provide a lock-free circular pool for 32bits global connection ID.

Hope the new class design would be a better encapsulation.
PTAL, thanks~ @breeswish

P.S. I place a class diagram in source file:
image

@pingyu
Copy link
Contributor Author

pingyu commented Jul 10, 2021

/run-check_dev_2

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2021
@pingyu pingyu force-pushed the global_kill_32_local-conn-id branch from 131d9aa to b3f930b Compare May 16, 2023 01:44
@ti-chi-bot ti-chi-bot bot added approved needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 3, 2023
Signed-off-by: Ping Yu <yuping@pingcap.com>
@pingyu pingyu force-pushed the global_kill_32_local-conn-id branch from ed57ab6 to 88d0cdb Compare June 3, 2023 09:11
@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2023
globalConnID := GCID{
ServerID: serverID,
LocalConnID: (1 << LocalConnIDBits64) - 1 - reservedNo,
Is64bits: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always use 64bits for the reserved conn ID. Will that bring any problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

A 64 bits global connection ID with server ID less than MaxServerID32 (2^11 - 1) is still valid, and globally unique.

This scenario will also happen after upgrade from 32 bits to 64 bits when the 20 bits local ID pool has been used up, while server ID stay unchanged.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 3, 2023

@xuyifangreeneyes: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 3, 2023

@xuyifangreeneyes: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@xuyifangreeneyes
Copy link
Contributor

/lgtm

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 4, 2023

@xuyifangreeneyes: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@xuyifangreeneyes
Copy link
Contributor

/approve

@xuyifangreeneyes
Copy link
Contributor

/lgtm

@xuyifangreeneyes
Copy link
Contributor

/approve

@xuyifangreeneyes
Copy link
Contributor

/merge

@breezewish
Copy link
Member

/lgtm

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: breezewish, xuyifangreeneyes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [breezewish,xuyifangreeneyes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 6, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-06-04 06:52:38.210073503 +0000 UTC m=+158344.483046286: ☑️ agreed by xuyifangreeneyes.
  • 2023-06-06 07:49:18.029427642 +0000 UTC m=+334544.302400424: ☑️ agreed by breezewish.

@ti-chi-bot ti-chi-bot bot merged commit 6ba0501 into pingcap:master Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note sig/execution SIG execution sig/sql-infra SIG: SQL Infra size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants