Job timeout doesn't kill the mysql query #4629

rauchy · 2020-02-23T08:45:52Z

I'm using the master branch. During refreshing one of my dashboards, a mysql query reached the timeout which raised the "query execution time limit".
I thought the timeout handler would also kill the related sql query in mysql server while it didn't. So there are lots of stuck sql queries in the mysql server.

So is that possible to also kill the related running query in the database server during handling the timeout?

Thanks.

Jakdaw · 2020-02-10T09:12:12Z

That should happen. What version of redash are you using?

dingweihua · 2020-02-10T09:59:08Z

@Jakdaw
I'm using the latest codes in master branch. It shows "v9.0.0-alpha" from the manager command.
I know the master branch is fully using RQ now. Does v8.x (which uses Celery) have the same result?

arikfr · 2020-02-10T11:34:09Z

@rauchy does the RQ implementation of cancel send Interrupt signal to the job before killing it?

dingweihua · 2020-02-12T01:37:06Z

@arikfr I dig into the code and found that the RQ did send interrupt signal to the job after cancelling the job. However my case is, I didn't manually cancel the job but let the query automatically trigger the timeout exception. But the timeout handler didn't kill the sql query.

rauchy · 2020-02-12T10:22:54Z

@dingweihua thanks for reporting. I will look into the signal propagation issue for MySQL, but meanwhile - you might benefit from updating to the latest master, as it seems that your version had an implicit 180 second timeout, which is no longer imposed.

dingweihua · 2020-02-12T12:38:44Z

@rauchy Yeah i got it. Thanks! You updated the latest default time limit to -1 to mention the user to setup it.

dingweihua · 2020-02-16T10:47:54Z

On considering this issue, I'm trying to use version 8.0 (which uses Celery). Would the sql query be automatically killed after timeout in v8.0? @rauchy

rauchy · 2020-02-16T12:06:17Z

Testing this, it appears that MySQL is receiving signals properly on timeouts and manual cancellation using the latest master. @dingweihua could you elaborate on the issue you are facing?

dingweihua · 2020-02-17T06:32:38Z

Thanks @rauchy
My question is on v8.0. In v8.0, if we wait a MYSQL query (some time-consuming query) to timeout, will our signal processing kill the sql query in the Mysql database server? As we mentioned above, it wouldn't in the master branch.

dingweihua · 2020-02-17T09:47:37Z

@dingweihua thanks for reporting. I will look into the signal propagation issue for MySQL, but meanwhile - you might benefit from updating to the latest master, as it seems that your version had an implicit 180 second timeout, which is no longer imposed.

Thinking about this issue again, i found you've already set the signal processing handler in QueryExecutor's run() method. But in this situation, I don't think our thread could receive the SIGINT signal. SIGINT is translated into a KeyboardInterrupt exception. While it seems that the JobTimeoutException is not a "SIGINT" like signal.

Actually, I debugged my local code (master branch). The run_query Exception was raised (in QueryExecutor.run()) while no SIGINT signal received.

rauchy · 2020-02-18T10:34:20Z

As we mentioned above, it wouldn't in the master branch.

What makes you think that? I ran the following test and saw that the query was cancelled in MySQL:

Start a long-running MySQL query in Redash (Something like select sleep(3600)).
Open a MySQL console and run SHOW FULL PROCESSLIST;, you should see the above query in there:

mysql> SHOW FULL PROCESSLIST;
+----+-----------------+------------------+------+---------+------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Id | User            | Host             | db   | Command | Time | State                  | Info                                                                                                                                                                                            |
+----+-----------------+------------------+------+---------+------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  4 | event_scheduler | localhost        | NULL | Daemon  |   27 | Waiting on empty queue | NULL                                                                                                                                                                                            |
|  8 | root            | localhost        | NULL | Query   |    0 | starting               | SHOW FULL PROCESSLIST                                                                                                                                                                           |
|  9 | user            | 123.12.1.2:49612 | db   | Query   |    7 | User sleep             | /* Username: john@doe.com, Query ID: adhoc, Queue: queries, Job ID: 7ce0d635-9499-4a85-a06b-31215d61c560, Query Hash: 31d8816df61c454e0c28d7373391820d, Scheduled: False */ select sleep(3600)  |
+----+-----------------+------------------+------+---------+------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

Cancel the query in Redash.
Run the same command in your MySQL console and watch that the query process has been terminated:

mysql> SHOW FULL PROCESSLIST;
+----+-----------------+-----------+------+---------+------+------------------------+-----------------------+
| Id | User            | Host      | db   | Command | Time | State                  | Info                  |
+----+-----------------+-----------+------+---------+------+------------------------+-----------------------+
|  4 | event_scheduler | localhost | NULL | Daemon  |   35 | Waiting on empty queue | NULL                  |
|  8 | root            | localhost | NULL | Query   |    0 | starting               | SHOW FULL PROCESSLIST |
+----+-----------------+-----------+------+---------+------+------------------------+-----------------------+
2 rows in set (0.00 sec)

Same thing with setting a short time limit with a query that times out. Are you getting different results?

dingweihua · 2020-02-18T11:40:48Z

As we mentioned above, it wouldn't in the master branch.

What makes you think that? I ran the following test and saw that the query was cancelled in MySQL:

Start a long-running MySQL query in Redash (Something like select sleep(3600)).
Open a MySQL console and run SHOW FULL PROCESSLIST;, you should see the above query in there:

mysql> SHOW FULL PROCESSLIST;
+----+-----------------+------------------+------+---------+------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Id | User            | Host             | db   | Command | Time | State                  | Info                                                                                                                                                                                            |
+----+-----------------+------------------+------+---------+------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  4 | event_scheduler | localhost        | NULL | Daemon  |   27 | Waiting on empty queue | NULL                                                                                                                                                                                            |
|  8 | root            | localhost        | NULL | Query   |    0 | starting               | SHOW FULL PROCESSLIST                                                                                                                                                                           |
|  9 | user            | 123.12.1.2:49612 | db   | Query   |    7 | User sleep             | /* Username: john@doe.com, Query ID: adhoc, Queue: queries, Job ID: 7ce0d635-9499-4a85-a06b-31215d61c560, Query Hash: 31d8816df61c454e0c28d7373391820d, Scheduled: False */ select sleep(3600)  |
+----+-----------------+------------------+------+---------+------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

Cancel the query in Redash.
Run the same command in your MySQL console and watch that the query process has been terminated:

mysql> SHOW FULL PROCESSLIST;
+----+-----------------+-----------+------+---------+------+------------------------+-----------------------+
| Id | User            | Host      | db   | Command | Time | State                  | Info                  |
+----+-----------------+-----------+------+---------+------+------------------------+-----------------------+
|  4 | event_scheduler | localhost | NULL | Daemon  |   35 | Waiting on empty queue | NULL                  |
|  8 | root            | localhost | NULL | Query   |    0 | starting               | SHOW FULL PROCESSLIST |
+----+-----------------+-----------+------+---------+------+------------------------+-----------------------+
2 rows in set (0.00 sec)

Same thing with setting a short time limit with a query that times out. Are you getting different results?

@rauchy Yes, you're right! I have the same idea on your mentioned case. But my case is different from yours. In my case, I didn't manually trigger the cancel operation. My situation is:

Start a long-running MySQL query in Redash
SHOW FULL PROCESSLIST and we'll see the query
Wait for the job timeout as we set the time limit to maybe 1 minute (do not trigger the cancel operation)
From the frontend webpage, we got the error message:

Error running query: Query exceeded Redash query execution time limit.
While the sql query in database server was not killed.

SHOW FULL PROCESSLIST But the sql query is still in the mysql server.

This is my case. In my case, I don't think we can receive the signal so we couldn't do anything on the running sql query. Do you think we should also kill the sql query after facing time limit (without manually cancelling)?

arikfr · 2020-02-18T12:04:19Z

@rauchy when a timeout is reached how the job is being cancelled?

rauchy · 2020-02-18T12:38:48Z

Same thing with setting a short time limit with a query that times out.

@dingweihua I've just double-checked this. As I said, on master you should see the same results when cancelling manually or when cancellation automatically occurs after a timeout.

@arikfr on manual cancellations the work horse will receive a SIGINT and cancel as it used to (by executing the MySQL KILL command), while on timeouts the work horse process would get terminated, which results in MySQL terminating the running queries. In fact, we can consider dropping the KILL execution for manual cancellations, as the work horse would exit anyway.

arikfr · 2020-02-18T12:48:45Z

@rauchy I know that Postgres terminates running queries if the client disconnects, but not sure about MySQL. I'm pretty sure it doesn't (hence the explicit behavior we have). Maybe it was introduced in newer versions of MySQL?

rauchy · 2020-02-18T12:54:07Z

@dingweihua which version of MySQL are you running?

dingweihua · 2020-02-19T02:39:10Z

I'm using MySQL 5.5.5-10.2.12-MariaDB

rauchy · 2020-02-19T11:39:15Z

@dingweihua I've tried this with 10.2.31 and 10.1.44 and they worked as expected. Wondering if this is dependent on the type of query you are running. In my example, I'm using SELECT SLEEP(3600), which puts the MySQL in a user-sleep state - can you try it with such a query?

dingweihua · 2020-02-20T01:56:30Z

@rauchy I've tried SELECT SLEEP(3600) and YES the timeout killed this query in MySQL server! So that means it works well on "SLEEP" query. But when i use a table related query, the SQL query wasn't killed after timeout. You can also try a time-consuming query on some large table and I think the same situation would also happen.

…g proccesses

rauchy · 2020-02-20T13:32:10Z

@dingweihua interesting. Can you try running with https://github.com/getredash/redash/tree/forward-timeout-signals-to-mysql?

dingweihua · 2020-02-21T13:09:48Z

@rauchy Yes, i just tried the branch and it works! I think it's a good fix in your change. By using the signal SIGALRM, you raised the JobTimeoutException so that the mysql runner could catch the exception to cancel the running query thread. Thanks!

rauchy · 2020-02-23T08:47:32Z

@arikfr WDYT about this? We'll probably need to attach to this signal to other multi-threaded runners.

* forward timeout SIGALRMs to MySQL threads in order to kill any running proccesses * no need to attach to SIGALRM as RQ already does that # Conflicts: # redash/query_runner/mysql.py

weekly-digest bot mentioned this pull request Feb 10, 2020

Weekly Digest (3 February, 2020 - 10 February, 2020) #4633

Closed

rauchy self-assigned this Feb 12, 2020

rauchy added this to To do in Switch from Celery to RQ via automation Feb 12, 2020

rauchy added the Backend label Feb 12, 2020

rauchy moved this from To do to In progress in Switch from Celery to RQ Feb 16, 2020

forward timeout SIGALRMs to MySQL threads in order to kill any runnin…

99832e2

…g proccesses

rauchy requested a review from arikfr February 23, 2020 08:46

weekly-digest bot mentioned this pull request Feb 24, 2020

Weekly Digest (17 February, 2020 - 24 February, 2020) #4678

Closed

arikfr approved these changes Feb 24, 2020

View reviewed changes

Switch from Celery to RQ automation moved this from In progress to Reviewer approved Feb 24, 2020

no need to attach to SIGALRM as RQ already does that

383efab

rauchy merged commit 35250d6 into master Feb 24, 2020

Switch from Celery to RQ automation moved this from Reviewer approved to Done Feb 24, 2020

rauchy deleted the forward-timeout-signals-to-mysql branch February 24, 2020 22:16

rauchy mentioned this pull request Feb 24, 2020

Perform cleanup on job timeouts #4681

Merged

2 tasks

weekly-digest bot mentioned this pull request Mar 2, 2020

Weekly Digest (24 February, 2020 - 2 March, 2020) #4702

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job timeout doesn't kill the mysql query #4629

Job timeout doesn't kill the mysql query #4629

rauchy commented Feb 23, 2020

Jakdaw commented Feb 10, 2020

dingweihua commented Feb 10, 2020

arikfr commented Feb 10, 2020

dingweihua commented Feb 12, 2020

rauchy commented Feb 12, 2020

dingweihua commented Feb 12, 2020

dingweihua commented Feb 16, 2020

rauchy commented Feb 16, 2020

dingweihua commented Feb 17, 2020

dingweihua commented Feb 17, 2020

rauchy commented Feb 18, 2020

dingweihua commented Feb 18, 2020

arikfr commented Feb 18, 2020

rauchy commented Feb 18, 2020

arikfr commented Feb 18, 2020

rauchy commented Feb 18, 2020

dingweihua commented Feb 19, 2020

rauchy commented Feb 19, 2020

dingweihua commented Feb 20, 2020

rauchy commented Feb 20, 2020

dingweihua commented Feb 21, 2020

rauchy commented Feb 23, 2020

Job timeout doesn't kill the mysql query #4629

Job timeout doesn't kill the mysql query #4629

Conversation

rauchy commented Feb 23, 2020

Jakdaw commented Feb 10, 2020

dingweihua commented Feb 10, 2020

arikfr commented Feb 10, 2020

dingweihua commented Feb 12, 2020

rauchy commented Feb 12, 2020

dingweihua commented Feb 12, 2020

dingweihua commented Feb 16, 2020

rauchy commented Feb 16, 2020

dingweihua commented Feb 17, 2020

dingweihua commented Feb 17, 2020

rauchy commented Feb 18, 2020

dingweihua commented Feb 18, 2020

arikfr commented Feb 18, 2020

rauchy commented Feb 18, 2020

arikfr commented Feb 18, 2020

rauchy commented Feb 18, 2020

dingweihua commented Feb 19, 2020

rauchy commented Feb 19, 2020

dingweihua commented Feb 20, 2020

rauchy commented Feb 20, 2020

dingweihua commented Feb 21, 2020

rauchy commented Feb 23, 2020