Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CTRL-C or kill <connId> to kill a connection/query by implementing global connection IDs #8854

Closed
2 of 3 tasks
morgo opened this issue Dec 27, 2018 · 17 comments · Fixed by #17649 or #20749
Closed
2 of 3 tasks
Assignees
Labels
feature/accepted This feature request is accepted by product managers type/enhancement type/feature-request This is a feature requests on the product

Comments

@morgo
Copy link
Contributor

morgo commented Dec 27, 2018

Description

Is your feature request related to a problem? Please describe:

Currently connection ids are local to TiDB servers, which means that a KILL x must be directed to the correct server, and can not safely be load balanced across the cluster.

This is annoying because it creates affinity to servers in a distributed architecture. It also doesn't retrofit to MySQL so well, since the KILL x syntax can't be supported (using KILL TIDB x to prevent incorrect usage).

Describe the feature you'd like:

I found this note while reading the source code, and it makes a lot of sense to me. I decided to create this FR for discussion and tracking since I couldn't find an existing one open.

parser/ast/misc.go:
        // TiDBExtension is used to indicate whether the user knows he is sending kill statement to the right tidb-server.
        // When the SQL grammar is "KILL TIDB [CONNECTION | QUERY] connectionID", TiDBExtension will be set.
        // It's a special grammar extension in TiDB. This extension exists because, when the connection is:
        // client -> LVS proxy -> TiDB, and type Ctrl+C in client, the following action will be executed:
        // new a connection; kill xxx;
        // kill command may send to the wrong TiDB, because the exists of LVS proxy, and kill the wrong session.
        // So, "KILL TIDB" grammar is introduced, and it REQUIRES DIRECT client -> TiDB TOPOLOGY.
        // TODO: The standard KILL grammar will be supported once we have global connectionID.

Since I believe the connectionID must be 32-bits for protocol compatibility, I assume that the implementation will use the first 8-16 bits for the host, and the remaining 16-24 bits for the local connectionID?

It would be nice to externalize the host portion as a pseudo server_id, since some functions such as UUID_SHORT depend on it being small. The pseudo server_id's can be recycled as TiDB servers are deployed/disappear, with pd ensuring they are unique. This address size of the pseudo server_id would limit the max number of tidb servers deployed simultaneously.

Describe alternatives you've considered:

Using 8 bits for the host seems short, but actually using only 16 bits for the local connection Id is also dangerously low, if you want to make sure new connection Ids allocated do not chase the tail of old connectionIds. I will let someone smarter figure that out :-)

Teachability, Documentation, Adoption, Migration Strategy:

It makes sense to extend SHOW PROCESSLIST to be SHOW [GLOBAL|SESSION] PROCESSLIST. I think the default of global makes more sense

Recommended Skills

  • Golang
  • tidb-server and pd-server

Learning Materials

@zz-jason zz-jason added type/feature-request This is a feature requests on the product and removed type/new-feature labels Mar 31, 2020
@github-actions github-actions bot added this to Need Triage in Feature Request Kanban Mar 31, 2020
@SunRunAway SunRunAway changed the title Global connectionIds Support ctrl c or kill <connId> to kill a connection/query by implementing Global connectionIds May 29, 2020
@SunRunAway SunRunAway changed the title Support ctrl c or kill <connId> to kill a connection/query by implementing Global connectionIds Support CTRL-C or kill <connId> to kill a connection/query by implementing global connection IDs May 29, 2020
@SunRunAway SunRunAway changed the title Support CTRL-C or kill <connId> to kill a connection/query by implementing global connection IDs UCP: Support CTRL-C or kill <connId> to kill a connection/query by implementing global connection IDs May 29, 2020
@breezewish
Copy link
Member

I noticed that since MySQL 5.7 connection ID becomes 64 bit: https://bugs.mysql.com/bug.php?id=65715 So it might not be a hard problem now.

@SunRunAway
Copy link
Contributor

SunRunAway commented May 29, 2020

Tips to implement,

  1. Design how many bits would aserver_id be, and the format of the global connection ID.
  2. Allocate a server_id from pd-server when tidb-server starts, and show the server_id in select * from information_schema.CLUSTER_INFO. (information_schema.CLUSTER_INFO already collects all information of tidb-servers, what we need is an extra column server_id)
  3. The global connection ID should be show in select * from information_schema.CLUSTER_PROCESSLIST
  4. Implement CTRL-C or kill <connId>, whichever a tidb-server receives this command, it forwards to the right tidb-server by global connection ID.
  5. Modify the documentation https://pingcap.com/docs-cn/stable/sql-statements/sql-statement-kill/ and https://pingcap.com/docs-cn/stable/tidb-configuration-file/#compatible-kill-query

@SunRunAway SunRunAway added this to Issue Backlog: Need Triage in SIG Runtime Kanban via automation May 29, 2020
@SunRunAway SunRunAway moved this from Issue Backlog: Need Triage to Backlog: Low Priority in SIG Runtime Kanban May 29, 2020
@SunRunAway SunRunAway added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 29, 2020
@pingyu
Copy link
Contributor

pingyu commented May 31, 2020

/pick-up-challenge

@sre-bot
Copy link
Contributor

sre-bot commented May 31, 2020

@pingyu pick up issue success

@wwar
Copy link

wwar commented May 31, 2020

I noticed that since MySQL 5.7 connection ID becomes 64 bit: https://bugs.mysql.com/bug.php?id=65715 So it might not be a hard problem now.

I don't believe this is fully fixed in MySQL 5.7. The server internally uses 64-bit, but the client APIs only expose the lower 32-bits. The fix in 5.7 is that it ensures the lower 32-bits never overlap, which is feasible only in single server context.

Another way to implement this would be to use a design similar to auto_increment works (tidb servers request connection IDs in batches). A batch of 2^16 ids should last for quite a while. I am not sure what is the best way to prevent connection IDs from being reused though, it needs to be possible to have long living connections so that operations like backup are possible.

@breezewish
Copy link
Member

@wwar It is already 64bit in 5.6.9:

Connection IDs now are permitted to be 64-bit values when the server supports them (when built with 64-bit data types), which has these effects:
Connection IDs are logged correctly to the general query log and slow query log.

Note
This change involves a modification to the log tables, so after upgrading to this release, you must run mysql_upgrade and restart the server.

CONNECTION_ID() returns a data type appropriate for values larger than 32 bits.

mysql_thread_id() is unchanged; the client/server protocal has only 4 bytes for the ID value. This function returns an incorrect (truncated) value for connection IDs larger than 32 bits and should be avoided.

mysql_kill() still cannot handle values larger than 32 bits, but to guard against killing the wrong thread now returns an error in these cases:

If given an ID larger than 32 bits, mysql_kill() returns a CR_INVALID_CONN_HANDLE error.

After the server's internal thread ID counter reaches a value larger than 32 bits, it returns an ER_DATA_OUT_OF_RANGE error for any mysql_kill() invocation and mysql_kill() fails.

To avoid problems with mysql_thread_id() and mysql_kill(), do not use them. To get the connection ID, execute a SELECT CONNECTION_ID() query and retrieve the result. To kill a thread, execute a KILL statement.

@SunRunAway SunRunAway moved this from Backlog: Low Priority to Designing in SIG Runtime Kanban Jun 1, 2020
@siddontang
Copy link
Member

Seem the connection ID in the handshake protocol is still 32 bits, see https://dev.mysql.com/doc/internals/en/connection-phase-packets.html#packet-Protocol::Handshake

we need to take care of it, e.g. the higher 32 bits is TiDB server ID, the lower is connection ID, we can return the lower part in the handshake protocol.

@breezewish
Copy link
Member

@siddontang Good catch! I further took a look at MySQL client implementation, and found that this 32-bit connection id in the handshake protocol will be finally used to send KILL statement 1 2 3. This means we may need some further designs to deal with the connection id in the protocol.

@pingyu
Copy link
Contributor

pingyu commented Jun 3, 2020

Thanks to all the friends and discussions here and slack.
After gather the information by far, I write a draft design doc at #17649.
PTAL, thanks~

@sre-bot
Copy link
Contributor

sre-bot commented Jun 7, 2020

This pick has been automatically canceled after more than a week.

@disksing
Copy link
Contributor

Hi @pingyu , I tracked this issue from the PD PR. IMO, it is a bad idea to use PD's allocID to generate tidb-server ID. The way allocID works is to allocate increasing IDs, and it will skip 1000 when restarting or leader switching. And what may be more needed here is an id similar to the Linux process number, just to ensure that it is not repeated for a period of time. In particular, PD's allocID does not guarantee that it will not exceed the int32 range.
May I ask the slack channel you were discussing? I might need some more context.

Feature Request Kanban automation moved this from Accepted to Finished May 6, 2021
@SunRunAway SunRunAway reopened this May 7, 2021
SIG Runtime Kanban automation moved this from Develop Finished (This Week) to In Progress May 7, 2021
Feature Request Kanban automation moved this from Finished to Need Triage May 7, 2021
@tisonkun tisonkun changed the title UCP: Support CTRL-C or kill <connId> to kill a connection/query by implementing global connection IDs Support CTRL-C or kill <connId> to kill a connection/query by implementing global connection IDs Jul 1, 2021
@tisonkun
Copy link
Contributor

tisonkun commented Jul 1, 2021

@SunRunAway @pingyu may you summary the tasks done so far and remaining work? I can see this issue being reopened several times and doesn't have a big picture on how we push it to finally finished.

@tisonkun tisonkun removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 1, 2021
@pingyu
Copy link
Contributor

pingyu commented Jul 2, 2021

@SunRunAway @pingyu may you summary the tasks done so far and remaining work? I can see this issue being reopened several times and doesn't have a big picture on how we push it to finally finished.

We separate the issue as the following tasks:

@tisonkun
Copy link
Contributor

tisonkun commented Jul 2, 2021

@pingyu Thanks for your information! I'll create TODO as subtasks for this issue to you.

@morgo
Copy link
Contributor Author

morgo commented Aug 15, 2022

Global kill has been added, and is now enabled by default. Closing this.

@morgo morgo closed this as completed Aug 15, 2022
Feature Request Kanban automation moved this from Need Triage to Finished Aug 15, 2022
pingyu added a commit to pingyu/tidb that referenced this issue Mar 25, 2023
fix CI.

Signed-off-by: Ping Yu <yuping@pingcap.com>
pingyu added a commit to pingyu/tidb that referenced this issue Apr 2, 2023
ID pools for global kill 32bit
ti-chi-bot bot pushed a commit that referenced this issue Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/accepted This feature request is accepted by product managers type/enhancement type/feature-request This is a feature requests on the product
Projects
SIG Runtime Kanban
  
In Progress
Development

Successfully merging a pull request may close this issue.