Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CommandTimeout #67

Closed
caleblloyd opened this issue Sep 27, 2016 · 35 comments
Closed

Implement CommandTimeout #67

caleblloyd opened this issue Sep 27, 2016 · 35 comments
Assignees
Milestone

Comments

@caleblloyd
Copy link
Contributor

1. Implement DefaultCommandTimeout in the connection string

Connection String Documentation

2. Implement CommandTimeout in MySqlCommand

MySqlCommand cmd = new MySqlCommand();
cmd.CommandTimeout = 60;

MySqlCommand Documentation:

Implementation Notes:

MySQL Connector/Net 6.2 introduced timeouts that are aligned with how Microsoft handles SqlCommand.CommandTimeout. This property is the cumulative timeout for all network reads and writes during command execution or processing of the results. A timeout can still occur in the MySqlReader.Read method after the first row is returned, and does not include user processing time, only IO operations. The 6.2 implementation uses the underlying stream timeout facility, so is more efficient in that it does not require the additional timer thread as was the case with the previous implementation.

It sounds like Oracle's implementation of CommandTimeout is setting Socket.SendTimeout and Socket.ReceiveTimeout. They start at the timeout value (e.g. 30 seconds) and subtract only time spent in I/O. This becomes the send or receive timeout for the next socket command.

@caleblloyd
Copy link
Contributor Author

I have a WIP of this here: caleblloyd@d061332

Commands are properly timing out, but the next command that goes over the same connection hits issues with reading packets out of order. Should the default logic be to reset a connection that hits a command timeout?

@bgrainger
Copy link
Member

bgrainger commented Oct 3, 2016

This is potentially going to be complicated.

I would expect a CommandTimeout to behave the same as passing in a CancellationToken that cancels after a timeout. (EDIT: I meant for the internals of the library; externally, it will throw a different exception type.) As per the notes on #3 this needs to involve KILL QUERY to free up server resources (and make the connection usable again). In that situation, the library shouldn't cancel socket operations, but actually read the final response from the server and leave the connection in a good state (so another command can be executed, it can be returned to the pool, etc.).

@caleblloyd
Copy link
Contributor Author

caleblloyd commented Oct 3, 2016

MariaDB/J opens another connection to run the KILL QUERY, code is here.

You're correct in #3, load balanced connections could come into play here. If it's round-robin DNS balanced connections, we could store the resolved IP Address of the server in the Connection. If it's Layer 4 load balancing, there's no way to guarantee you get the same server (some router with a consistent IP address is maintaining a NAT Table to a cluster of MySQL servers).

If the driver has to open another connection anyways, why not just dispose the timed-out socket first, and replace it with a fresh socket? As long as MySQL internally cleans up the transaction on a disposed socket, this approach should work. This approach could be implemented in MySqlSession while the connection is still "leased out" of the pool, so that the socket that is eventually returned is in a good state.

@bgrainger
Copy link
Member

why not just dispose the timed-out socket first, and replace it with a fresh socket?

This would lose all session variables, temporary tables, etc. If the client is able to keep using the same MySqlConnection object, then they would expect it to have the same logical state. (I don't know if MySql.Data implements this; I assume it does.)

I'm thinking that the connection pool should have n dedicated KILL QUERY threads (one per IP address per comma-delimited hostname) even though this runs the risk of exceeding the maximum number of connections to the server if the user configures the maximum pool size to be exactly the same as the maximum number of server connections. (This seems a rare-enough scenario that it could be listed as a known issue.)

@caleblloyd
Copy link
Contributor Author

MySql.Data does implement KILL QUERY as well, code is here. It looks like they do not attempt to connect to a specific IP.

I'm thinking that the connection pool should have n dedicated KILL QUERY threads (one per IP address per comma-delimited hostname)

I like the idea of saving the IP Address of the server that was connected to in the Session and trying to reconnect to this specific server. I think this should be a one-off connection outside of the pool when KILL QUERY is needed though. It's a lot of extra effort to try to track the connection after the one-off command, MySql.Data and MariaDB/J do not.

If it's Layer 4 load balancing, there's no way to guarantee you get the same server

Does the Server ID get sent from the server when opening a connection? If so we should save this in the Session also.

As a last resort, if we can't open a new connection to the server (server connection limit hit or server went down without a clean shutdown), or server IDs do not match (layer 4 load balancing) should we dispose the socket and switch it out with a new one? Would this require a different exception message?

@bgrainger
Copy link
Member

I think this should be a one-off connection outside of the pool when KILL QUERY is needed though. It's a lot of extra effort to try to track the connection after the one-off command, MySql.Data and MariaDB/J do not.

Sounds good.

Does the Server ID get sent from the server when opening a connection?

Not sure what you mean by "Server ID". There is a server version and a (server-local) connection ID in the initial handshake packet. AFAIK there's no value that uniquely identifies a server.

should we dispose the socket and switch it out with a new one?

I think it would be best to throw a MySqlException that explains that the connection is no longer usable and leave the session in a disconnected state. It would have to be up to the user to decide how to recover: is opening a new MySqlConnection sufficient, or is this error condition unrecoverable (from the client's perspective) due to abandoned temporary tables, etc.

@caleblloyd
Copy link
Contributor Author

caleblloyd commented Oct 3, 2016

Not sure what you mean by "Server ID"

The server_id global variable is used in replicated setups. It is required to be unique for replication to start, but looks like it's not sent in the initial handshake packet.

There is a "Replication" option that can be set in the connection string. If this was set, we could run a SELECT @@server_id query as part of the connection open process.

The worry is that without comparing server_id before killing a query from a separate connection on a replicated setup, we could actually be killing a valid query on a different server that has the same connection_id.

@bgrainger
Copy link
Member

Some well-written bugs against Connector/NET that should be tested when this is implemented: 86009, 86010, 85978.

@bgrainger
Copy link
Member

The max_execution_time server variable controls the length of time the server spends running a SELECT query; unfortunately it doesn't apply to UPDATEs or INSERTs, etc., and is only available on >5.7.8.

@nvivo
Copy link

nvivo commented Aug 2, 2017

While this is not supported, what is the current behavior? It just runs forever?

@bgrainger
Copy link
Member

bgrainger commented Aug 2, 2017

Yes, it will run until the server returns results. A workaround is:

/* using (var cts = new CancellationTokenSource(TimeSpan.FromSeconds(YOUR_TIMEOUT))
{
    await command.ExecuteNonQueryAsync(cts.Token); // <-- will throw after timeout
} */

(You can see more examples in CancelTests.cs.)

@bgrainger
Copy link
Member

bgrainger commented Aug 3, 2017

My best understanding of this property is that it essentially automatically sets a timeout on every public method that the client can call. From SqlCommand.CommandTimeout:

This property is the cumulative time-out (for all network packets that are read during the invocation of a method) for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time.

For example, with a 30 second time out, if Read requires two network packets, then it has 30 seconds to read both network packets. If you call Read again, it will have another 30 seconds to read any data that it requires.

It could be considered obsolete in an async world where methods take CancellationToken parameters, as the client could pass in new CancellationTokenSource(myTimeSpan).Token. However, I can see the benefit as a convenience method for not having to repeatedly create a new CTS for each individual method call.

Interestingly, SqlCommand says this (but doesn't mention new-style async methods):

The CommandTimeout property will be ignored during asynchronous method calls such as BeginExecuteReader.

What I dislike about it is that it will add some (probably non-trivial) amount of overhead to track the remaining time allotted to each method call and set a timer that will call MySqlConnection.Cancel(MySqlCommand), but that timer will almost never fire (assuming that most users don't typically execute extremely slow queries assuming that network failure is rare).

My vote would be to make a breaking change with MySql.Data and set DefaultCommandTimeout = 0 (that is, methods never time out by default), and let users opt in to the performance hit of having MySqlConnector track timeouts. (This is essentially our current default, and it doesn't appear to be a significant problem for most current users.) The code would only set (and cancel) the timers if MySqlCommand.CommandTimeout (inherited from DefaultCommandTimeout) is non-zero.

@bgrainger
Copy link
Member

MySql.Data has an interesting feature where it kills the query, silently swallows any results returned by it (so the connection/session is still in a valid state), then throws the timeout exception. This may be interesting to implement.

@nvivo
Copy link

nvivo commented Aug 3, 2017

That's fine for me. I just migrated my entire cluster to MySqlConnector and EFCore now, so I had to remove the "Default Command Timeout" overrides from connection strings. I have a lot of queries that take 10 minutes or more to complete, and in all cases, I was adding more time, guess I never thought about not having them at all. Didn't know about async not using them.

For now I'll just take that into account and move to cancellation tokens where necessary. Thanks!

bgrainger added a commit to bgrainger/MySqlConnector that referenced this issue Aug 4, 2017
DefaultCommandTimeout is set to 0 (i.e., no timeout) instead of 30 as in
MySql.Data. This should remove unnecessary performance overhead of
setting a timer for every single operation.

This library also starts the timer from the beginning of the public
method call, instead of the cumulative total of network I/O time.
bgrainger added a commit to bgrainger/MySqlConnector that referenced this issue Aug 4, 2017
DefaultCommandTimeout is set to 0 (i.e., no timeout) instead of 30 as in
MySql.Data. This should remove unnecessary performance overhead of
setting a timer for every single operation.

This library also starts the timer from the beginning of the public
method call, instead of the cumulative total of network I/O time.
@bgrainger
Copy link
Member

I just migrated my entire cluster to MySqlConnector and EFCore now

👍

so I had to remove the "Default Command Timeout" overrides from connection strings

I'm working on a patch that will accept this connection string setting.

I have a lot of queries that take 10 minutes or more to complete, and in all cases, I was adding more time, guess I never thought about not having them at all.

You could set DefaultCommandTimeout=0 in the connection string to disable timeouts entirely. (This isn't currently supported by MySqlConnector but should be in the next version.) OTOH, I can see the value in saying "this query might take a long time to run, but I'd prefer an exception rather than a hang if it takes more than an hour".

Didn't know about async not using them.

To clarify, that was the Microsoft SQL Client documentation. I haven't checked the code but I would assume that MySql.Data does use CommandTimeout on its async methods because they simply delegate to the sync methods.

bgrainger added a commit to bgrainger/MySqlConnector that referenced this issue Aug 4, 2017
DefaultCommandTimeout is set to 0 (i.e., no timeout) instead of 30 as in
MySql.Data. This should remove unnecessary performance overhead of
setting a timer for every single operation.

This library also starts the timer from the beginning of the public
method call, instead of the cumulative total of network I/O time.
@kishoretvk
Copy link

hey what is the solution, can we download the dev repo and change the timeout1?

i tried using ado .net too but even that too defaults too 30 seconds

@bgrainger
Copy link
Member

can we download the dev repo and change the timeout1?

Not really, because it's not implemented yet. (Right now, you shouldn't be observing any timeouts happening, because there's no code in place to make them happen.)

@nvivo
Copy link

nvivo commented Sep 22, 2017

I have been noticing some queries running forever here, no idea why. It seems having the timeout could act like a failsafe for these queries so at least I can notice them.

I've tested adding a timeout with a CancellationToken, but I noticed that it doesn't fail with an exception when the token is cancelled. This causes the fail to be silent. I tried to handle this manually with something like this:

CancellationTokenSource cts = null;

if (cmd.CommandTimeout > 0)
    cts = new CancellationTokenSource(cmd.CommandTimeout * 1000);

try
{
    var result = cmd.ExecuteNonQueryAsync(cts.Token);

    if (cts.IsCancellationRequested)
        throw new MySqlException("command timeout");

    return result;
}
finally
{
    if (cts != null)
        cts.Dispose();
}

But the throw must be inside the command executor or even deeper in the datareader, otherwise the token might be cancelled after the query finishes but before testing for the token.

Also, using the token has the downside of not being able to use Dapper to set the timeout like: await cn.QueryAsync("...", commandTimeout: 30) and forcing creating the command manually.

bgrainger added a commit to bgrainger/MySqlConnector that referenced this issue Sep 25, 2017
DefaultCommandTimeout is set to 0 (i.e., no timeout) instead of 30 as in
MySql.Data. This should remove unnecessary performance overhead of
setting a timer for every single operation.

This library also starts the timer from the beginning of the public
method call, instead of the cumulative total of network I/O time.
bgrainger added a commit to bgrainger/MySqlConnector that referenced this issue Oct 8, 2017
DefaultCommandTimeout is set to 0 (i.e., no timeout) instead of 30 as in
MySql.Data. This should remove unnecessary performance overhead of
setting a timer for every single operation.

This library also starts the timer from the beginning of the public
method call, instead of the cumulative total of network I/O time.
bgrainger added a commit to bgrainger/MySqlConnector that referenced this issue Oct 13, 2017
DefaultCommandTimeout is set to 0 (i.e., no timeout) instead of 30 as in
MySql.Data. This should remove unnecessary performance overhead of
setting a timer for every single operation.

This library also starts the timer from the beginning of the public
method call, instead of the cumulative total of network I/O time.
@bgrainger bgrainger self-assigned this Oct 15, 2017
@bgrainger
Copy link
Member

I've realised that most of my earlier analysis is incorrect.

There are two important types of cancellation: logical and physical. The logical cancellation (triggered by passing a CancellationToken to various methods) executes a KILL QUERY and keeps the connection open. The physical cancellation is necessary to recover from network failures. See this article where the problem of half-open connections is discussed and four solutions are given. We need to implement solution 3: "Explicit timer assuming the worst".

DefaultCommandTimeout and CommandTimeout are necessary to prevent the client hanging forever expecting a response when there is a network failure (or crashed server?). Its implementation has to assume that the TCP connection is no longer valid and must tear it down. This is in explicit contrast to the logical cancellation, which aborts the current query but keeps the active connection (with its session variables, temp tables, etc.).

@bgrainger
Copy link
Member

Implemented in 0.29.0.

@miracles-happen
Copy link

Dear Bradley,

Kindly please clarify what exactly happens when Command is timed out?

Assuming that we are executing heavy UPDATE on some of MySQL table. Whether MySQL command will succeed from the MySQL point of view?

What exactly happenned in connector on timeout detection? Whether KILL QUERY will be send to MySQL? (and, if I'm understanding correctly, this can potentially lead to data corruption)
Or MySQL command continues its execution untill completion and only MySql connector consumer will receive an exception?

Thank you in advance.

@bgrainger
Copy link
Member

Kindly please clarify what exactly happens when Command is timed out?

The client will abort the connection. I don't know if this means a TCP RST packet is sent or not; you could look with Wireshark to find out.

Assuming that we are executing heavy UPDATE on some of MySQL table. Whether MySQL command will succeed from the MySQL point of view?

I don't think the server behaviour is precisely defined in this situation. It might even depend on the exact version of MySQL Server you're using, or whether you're using MySQL Server or MariaDB or another MySQL variant.

Whether KILL QUERY will be send to MySQL?

No, this will not happen with command timeout. Command Timeout exists to prevent the client from hanging if the server becomes unreachable, so it aborts the current connection instead of executing more server commands. (If you do want KILL QUERY, use an Async method overload that takes a CancellationToken; when the token is cancelled, KILL QUERY will be sent to gracefully cancel the executing command.)

(and, if I'm understanding correctly, this can potentially lead to data corruption)

I haven't heard about this, but I haven't done a lot of reading about how KILL QUERY is implemented, e.g., does the server "roll back" the command as though it were in an implicit transaction, or does it commit a partial update? You should consult your MySQL Server manual to see if its behaviour is documented.

Or MySQL command continues its execution untill completion and only MySql connector consumer will receive an exception?

This seems possible, but I have never tested it to see. Again, consult your MySQL Server manual to see if its behaviour is documented.

@miracles-happen
Copy link

Ok, thank you for your response.

@penguinawesome
Copy link

hi @bgrainger , we are encountering the same issue with this library, the query keeps on running in the server while in the client side, it has been time out already (since we set the DEFAULT COMMAND TIMEOUT in the connection string to just 10 seconds). Basically we also want to cancel and cut very long running queries based on the timeout that we set and it should be reflected in the server side as well.

How can we implement this in this library?

@ghost
Copy link

ghost commented Sep 22, 2020

for this i've used set max session timeout, or the query level equivalent.
could not find an equivalent for delete/update statements though.

@bgrainger It would be good to have an easier way to handle timeouts (or some documentation).
In my code (due to paranoia) i am using: command timeout + cancellation token + mysql command to ensure timeout is correctly applied everywhere; works but not very elegant.

@penguinawesome
Copy link

penguinawesome commented Sep 22, 2020

@CatalinAdlerDF i agree with you. I hope this can be set in the Code level (per connection) something like Connection.MaxExecutionTime = 8000; (in milliseconds) or something similar that will indicate the library to make either auto cancel or kill the query when the timeout reaches.

This will be a very good and critical feature. We are doing default command timeouts and with retry functionality, when the retry triggered, there will be so many additional queries that is running in the DB and causing our entire DB server to slowdown and eventually crash and unusable. We have just noticed this behaviour today and caused our entire production platform to down.

We are using the Default Command Timeout for the purpose of avoiding running very long queries in the production. In real application, this feature is a must and critical.

I really hope that this can be implemented in this library, this is a critical feature in a production @bgrainger

@ghost
Copy link

ghost commented Sep 22, 2020

But it is implemented, and it actually sends a kill command to mysql on a different connection.
You just have to use it.

My point was just to try to make it simpler, if possible.

btw, just a thought, all operations in a prod app need a correctly enforced timeout; otherwise the system will never be controllable.

@penguinawesome
Copy link

@CatalinAdlerDF how did you implement the "it actually sends a kill command to mysql on a different connection" ?

Please help me.

The only option i know is setting a /*+ MAX_EXECUTION_TIME({0}) */ in a query but we have tons of sql queries in our data layer and doing this needs to modify each every query that we have in our code which is not a fast solution.

@bgrainger
Copy link
Member

Basically we also want to cancel and cut very long running queries based on the timeout that we set and it should be reflected in the server side as well.

How can we implement this in this library?

That is currently done by explicitly calling MySqlCommand.Cancel, or by using an ExecuteX overload that takes a CancellationToken.

#455 is open to track improving this. npgsql just made a similar change (npgsql/npgsql#1567) so there may be lessons from their work this project can adopt.

@penguinawesome
Copy link

penguinawesome commented Sep 22, 2020

@bgrainger not all our data layer is using the async versions, for the sync version, we tried calling MySqlCommand.Cancel in the catch exception part however upon calling it, we received another InvalidOperationException: Connection must be Open; current state is Closed.

Our code:

using (myCommand)
{
try
{
result = myCommand.ExecuteReader();
}
catch (Exception ex)
{
try
{
myCommand.Cancel();
}
catch (Exception ex2)
{
Debug.WriteLine(ex2);
}

                throw;
            }

        }

How can we force kill the running query upon timeout in the sync version?

I am using the 0.65 version from nuget

@penguinawesome
Copy link

penguinawesome commented Sep 22, 2020

for our async version, we tried applying cancellationToken source for timeout, however, how can we determine if the task was successful or reached the timeout?

Here is our code:

var cancelToken = new CancellationTokenSource(TimeSpan.FromSeconds(5)).Token;

                var task = myCommand.ExecuteReaderAsync(cancelToken);

                result = (MySqlDataReader)await task;

                if(task.Status == TaskStatus.Canceled)
                {

                }

I am using the 0.65 version from nuget

@ghost
Copy link

ghost commented Sep 22, 2020

@firephantomassasin If you execute select sql queries you can use the mysql server support for execution timeouts: session or statement level.
Check this.

@penguinawesome
Copy link

@CatalinAdlerDF yes that's what i'm implementing. But i want to implement in the library as well such as MySqlCommand.Cancel, however it is throwing some errors.

Hi @bgrainger please help on how to use the MySqlCommand.Cancel when timeout error occurred

@ghost
Copy link

ghost commented Sep 23, 2020

@firephantomassasin Can't you use the overloads that take in a CancellationToken?
That calls Cancel behind the scenes automagically.

@bgrainger
Copy link
Member

the query keeps on running in the server while in the client side, it has been time out already ... Basically we also want to cancel and cut very long running queries based on the timeout that we set and it should be reflected in the server side as well.

There is now a proposal to implement this: #455 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants