Killing connections needs cooperation from the to-be-killed connection #24031

dveeden · 2021-04-14T14:07:18Z

Bug Report

1. Minimal reproduce step (Required)

Run TiDB playground.
Connect two sessions with mysql --host 127.0.0.1 --port 4000 -u root -p
From connection A: run SHOW PROCESSLIST and get the connection ID of the other connection (mysql reports its own connection id when starting up)
Now from connection A issue: KILL TIDB CONNECTION <id> where ID is the connection ID of the other connection.
Now run SHOW PROCESSLIST again.

Note that this is with a single tidb node.

2. What did you expect to see? (Required)

The killed connection gone from the processlist.

3. What did you see instead (Required)

The other connection still being in the processlist until it tries to run a query.

Once a query like do 1 is attempted on connection B the connection is gone from the processlist and connection B shows ERROR 2013 (HY000): Lost connection to MySQL server during query

4. What is your TiDB version? (Required)

mysql> select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v5.0.0
Edition: Community
Git Commit Hash: bdac0885cd11bdf571aad9353bfc24e13554b91c
Git Branch: heads/refs/tags/v5.0.0
UTC Build Time: 2021-04-06 16:36:29
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec)

The text was updated successfully, but these errors were encountered:

morgo · 2021-04-14T15:18:16Z

This is important in the case of max_connections, since idle connections can consume slots and denial of service new connections.

I assume the fix is to have some idle loop to check if the connection is killed.

cosven · 2021-04-15T01:51:05Z

/severity major

dveeden · 2021-04-15T06:49:52Z

I tried 4.0.12 and that version behaves the same.

dveeden · 2021-04-15T06:51:18Z

Besides what @morgo says this also makes the kill command seem less reliable reducing the trust users may have in the product.

xinwu5 · 2021-04-19T12:47:16Z

Running version 5.7.25-TiDB-v5.0.0, kill query just does not work on alter statement.

|  113 | root        | xx.xx.xx.xx.xx  | test | Query   | 1248 | autocommit | alter table xxxxx

mysql> KILL TIDB CONNECTION 113;
Query OK, 0 rows affected (0.11 sec)

|  113 | root        | xx.xx.xx.xx.xx  | test | Query   | 1494 | autocommit | alter table xxxxxx

dveeden · 2021-04-19T15:01:10Z

This looks like it is killing the connection after the first interaction after the running query and not directly stop the query as expected.

morgo · 2021-04-19T15:50:26Z

Running version 5.7.25-TiDB-v5.0.0, kill query just does not work on alter statement.

|  113 | root        | xx.xx.xx.xx.xx  | test | Query   | 1248 | autocommit | alter table xxxxx

mysql> KILL TIDB CONNECTION 113;
Query OK, 0 rows affected (0.11 sec)

|  113 | root        | xx.xx.xx.xx.xx  | test | Query   | 1494 | autocommit | alter table xxxxxx

I think this is a different issue, which is that the query being killed is DDL (which runs on the DDL owner, the state in the processlist here is just polling the progress of DDL). Try killing SELECT SLEEP(60) for example.

For killing DDL, there is ADMIN CANCEL DDL. I am not sure if it is a feature request or a bug, but DDL should be killable with the kill statement.

xinwu5 · 2021-04-19T18:47:36Z

For killing DDL, there is ADMIN CANCEL DDL. I am not sure if it is a feature request or a bug, but DDL should be killable with the kill statement.

ADMIN CANCEL DDL manages to kill the query. Agree that kill command should be able to kill the DDL.

morgo · 2021-04-19T20:31:58Z

I've forked the KILL DDL issue to #24144

morgo · 2021-08-05T01:38:19Z

I took a look at this today. The issue is as described; killing a connection sets cc.status to connStatusWaitShutdown, which is read on the next interaction in that session, and then the connection is killed.

My suggestion to fix it is the following:

SHOW PROCESSLIST (and infoschema) is modified to show the State as Killed.
Instead of setting the read timeout to waitTimeout, the code is instead modified to have a hard coded 2s timeout, but loops for up to waitTimeout retrying a read:

tidb/server/conn.go

Lines 933 to 951 in 0c72834

    
           // Usually, client connection status changes between [dispatching] <=> [reading]. 
        
           // When some event happens, server may notify this client connection by setting 
        
           // the status to special values, for example: kill or graceful shutdown. 
        
           // The client connection would detect the events when it fails to change status 
        
           // by CAS operation, it would then take some actions accordingly. 
        
           for { 
        
           	if !atomic.CompareAndSwapInt32(&cc.status, connStatusDispatching, connStatusReading) || 
        
           		// The judge below will not be hit by all means, 
        
           		// But keep it stayed as a reminder and for the code reference for connStatusWaitShutdown. 
        
           		atomic.LoadInt32(&cc.status) == connStatusWaitShutdown { 
        
           		return 
        
           	} 
        
           	cc.alloc.Reset() 
        
           	// close connection when idle time is more than wait_timeout 
        
           	waitTimeout := cc.getSessionVarsWaitTimeout(ctx) 
        
           	cc.pkt.setReadTimeout(time.Duration(waitTimeout) * time.Second) 
        
           	start := time.Now() 
        
           	data, err := cc.readPacket()

We have a similar problem with the sleep function, which handles interuption by looping:

tidb/expression/builtin_miscellaneous_vec.go

Lines 336 to 355 in 0c72834

    
           func doSleep(secs float64, sessVars *variable.SessionVars) (isKilled bool) { 
        
           	if secs <= 0.0 { 
        
           		return false 
        
           	} 
        
           	dur := time.Duration(secs * float64(time.Second.Nanoseconds())) 
        
           	ticker := time.NewTicker(10 * time.Millisecond) 
        
           	defer ticker.Stop() 
        
           	timer := time.NewTimer(dur) 
        
           	for { 
        
           		select { 
        
           		case <-ticker.C: 
        
           			if atomic.CompareAndSwapUint32(&sessVars.Killed, 1, 0) { 
        
           				timer.Stop() 
        
           				return true 
        
           			} 
        
           		case <-timer.C: 
        
           			return false 
        
           		} 
        
           	} 
        
           }

dveeden · 2021-08-05T10:26:02Z

@morgo minor remark: MySQL uses Killed instead of killed. Would be good to keep this identical if possible.

bb7133 · 2021-09-26T03:13:51Z

Hi @morgo

Instead of setting the read timeout to waitTimeout, the code is instead modified to have a hard coded 2s timeout, but loops for up to waitTimeout retrying a read:

Is it better to set the timeout to `NOW'? I saw some codes with a similar purpose as a workaround to interrupt the blocking read from the connection:

https://github.com/google/mtail/pull/497/files#diff-5c07b612b388d4ea0b64ab060e63777999598c659b7deb73ca44a7f2564ad4cdR17

djshow832 · 2021-09-26T07:59:01Z

Modifying waitTimeout sounds similar to cancelling the context directly. So here's another solution:

Every session/connection has a context. Killing the session/connection cancels the context.
Every query has a context, which inherits from the context belonging to the session/connection. Killing the query cancels the context.

bb7133 · 2021-09-26T09:06:43Z

@djshow832 context the straightforward(and 'gopher') way. I've checked the code and looks 'cancels the context' is applicable:

tidb/server/conn.go

Line 929 in e3e1fb9

if atomic.LoadInt32(&cc.status) != connStatusShutdown {

tiancaiamao · 2021-09-27T11:07:51Z

Modifying waitTimeout sounds similar to cancelling the context directly. So here's another solution:

Every session/connection has a context. Killing the session/connection cancels the context.

Every query has a context, which inherits from the context belonging to the session/connection. Killing the query cancels the context.

Cancel is problematic, because there are many concurrent executor and they may not handle the cancel welll, result in resource leak or even query block.

tiancaiamao · 2021-09-27T11:09:38Z

I have a internal document why cancel doesn't work

https://docs.google.com/document/d/1vnc2fsUkE_j0755-247lZL_KtD0FkZLEMe1ULX4p7iI/edit#

And that's why we check the killed flag

bb7133 · 2021-11-05T13:12:47Z

I have a internal document why cancel doesn't work

https://docs.google.com/document/d/1vnc2fsUkE_j0755-247lZL_KtD0FkZLEMe1ULX4p7iI/edit#

And that's why we check the killed flag

I see, thanks for the explaination.

bb7133 · 2021-11-05T13:15:08Z

@tiancaiamao So I think setting waitTimeout is a better way?

yiwen92 · 2021-11-08T10:51:19Z

After an internal discussion, finally we decide to filter out killed connections when doing show processlist. This will keep the same behaviour as v4.0.10 and before.

github-actions · 2021-11-12T04:45:19Z

Please check whether the issue should be labeled with 'affects-x.y' or 'fixes-x.y.z', and then remove 'needs-more-info' label.

bb7133 · 2021-11-25T02:31:05Z

@morgo I will make it as 'enhancement' since the current behavior is expected. We can still keep this issue open as a backlog.

) (pingcap#29212)" This reverts commit 52c6890.

close #24031, ref #29212

yiwen92 · 2022-04-01T03:15:21Z

need cherry pick to all historical versions

close #24031, ref #29212

dveeden added the type/bug This issue is a bug. label Apr 14, 2021

morgo added the sig/sql-infra SIG: SQL Infra label Apr 14, 2021

ti-chi-bot added the severity/major label Apr 15, 2021

morgo mentioned this issue Apr 19, 2021

KILL TIDB or global KILL should cancel DDL #24144

Closed

morgo self-assigned this Aug 2, 2021

bb7133 assigned yiwen92 and unassigned morgo Sep 26, 2021

yiwen92 mentioned this issue Oct 28, 2021

server: fix show problem for kill tidb connection (#24031) #29212

Merged

11 tasks

ti-chi-bot closed this as completed in #29212 Nov 12, 2021

ti-chi-bot pushed a commit that referenced this issue Nov 12, 2021

server: fix show problem for kill tidb connection (#24031) (#29212)

52c6890

bb7133 removed severity/major needs-more-info labels Nov 26, 2021

bb7133 added a commit to bb7133/tidb that referenced this issue Mar 3, 2022

Revert "server: fix show problem for kill tidb connection (pingcap#24031

97ac2a7

) (pingcap#29212)" This reverts commit 52c6890.

bb7133 mentioned this issue Mar 3, 2022

server: a better way to handle killed connection #32809

Merged

4 tasks

ti-chi-bot closed this as completed in #32809 Mar 15, 2022

ti-chi-bot pushed a commit that referenced this issue Mar 15, 2022

server: a better way to handle killed connection (#32809)

403dcfd

close #24031, ref #29212

bb7133 added affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. labels Apr 1, 2022

qiancai mentioned this issue Apr 6, 2022

add 6.0 release notes pingcap/docs#7987

Merged

15 tasks

ti-srebot mentioned this issue Sep 15, 2022

server: a better way to handle killed connection (#32809) #37834

Merged

4 tasks

ti-chi-bot pushed a commit that referenced this issue Sep 15, 2022

server: a better way to handle killed connection (#32809) (#37834)

cd5db42

close #24031, ref #29212

This was referenced Nov 4, 2022

server: a better way to handle killed connection (#32809) #38887

Merged

server: a better way to handle killed connection (#32809) #38888

Merged

ti-chi-bot added a commit that referenced this issue Nov 4, 2022

server: a better way to handle killed connection (#32809) (#38888)

5d411f5

close #24031, ref #29212

ti-chi-bot added a commit that referenced this issue Nov 4, 2022

server: a better way to handle killed connection (#32809) (#38887)

5410d1b

close #24031, ref #29212

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Killing connections needs cooperation from the to-be-killed connection #24031

Killing connections needs cooperation from the to-be-killed connection #24031

dveeden commented Apr 14, 2021

morgo commented Apr 14, 2021

cosven commented Apr 15, 2021

dveeden commented Apr 15, 2021

dveeden commented Apr 15, 2021

xinwu5 commented Apr 19, 2021 •

edited

Loading

dveeden commented Apr 19, 2021

morgo commented Apr 19, 2021

xinwu5 commented Apr 19, 2021 •

edited

Loading

morgo commented Apr 19, 2021

morgo commented Aug 5, 2021 •

edited

Loading

dveeden commented Aug 5, 2021

bb7133 commented Sep 26, 2021

djshow832 commented Sep 26, 2021

bb7133 commented Sep 26, 2021

tiancaiamao commented Sep 27, 2021

tiancaiamao commented Sep 27, 2021

bb7133 commented Nov 5, 2021

bb7133 commented Nov 5, 2021

yiwen92 commented Nov 8, 2021

github-actions bot commented Nov 12, 2021

bb7133 commented Nov 25, 2021

yiwen92 commented Apr 1, 2022

Killing connections needs cooperation from the to-be-killed connection #24031

Killing connections needs cooperation from the to-be-killed connection #24031

Comments

dveeden commented Apr 14, 2021

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

morgo commented Apr 14, 2021

cosven commented Apr 15, 2021

dveeden commented Apr 15, 2021

dveeden commented Apr 15, 2021

xinwu5 commented Apr 19, 2021 • edited Loading

dveeden commented Apr 19, 2021

morgo commented Apr 19, 2021

xinwu5 commented Apr 19, 2021 • edited Loading

morgo commented Apr 19, 2021

morgo commented Aug 5, 2021 • edited Loading

dveeden commented Aug 5, 2021

bb7133 commented Sep 26, 2021

djshow832 commented Sep 26, 2021

bb7133 commented Sep 26, 2021

tiancaiamao commented Sep 27, 2021

tiancaiamao commented Sep 27, 2021

bb7133 commented Nov 5, 2021

bb7133 commented Nov 5, 2021

yiwen92 commented Nov 8, 2021

github-actions bot commented Nov 12, 2021

bb7133 commented Nov 25, 2021

yiwen92 commented Apr 1, 2022

xinwu5 commented Apr 19, 2021 •

edited

Loading

xinwu5 commented Apr 19, 2021 •

edited

Loading

morgo commented Aug 5, 2021 •

edited

Loading