Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled exception in MySqlConnector.Utilities.TimerQueue.Callback #1002

Closed
bjornhandersson opened this issue Jun 15, 2021 · 9 comments
Closed

Comments

@bjornhandersson
Copy link

Possibly related to: #820

Version:
1.3.4

An unhandled exception in MySqlConnector.Utilities.TimerQueue.Callback crashes the process without giving the user a chance to handle it.

Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.InvalidOperationException
at MySqlConnector.MySqlConnection.get_Session()
at MySqlConnector.MySqlConnection.get_ServerThread()
at MySqlConnector.MySqlConnection.Cancel(MySqlConnector.Core.ICancellableCommand, Int32, Boolean)
at MySqlConnector.Utilities.TimerQueue.Callback(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.TimerQueueTimer.CallCallback()
at System.Threading.TimerQueueTimer.Fire()
at System.Threading.TimerQueue.FireNextTimers()

By looking at the code it looks like the state of the connections changes meanwhile canceling the command.
I added line comments in the snippet below to explain the issue.
// Bug report comment:
// My comment.

MySqlConnection.Cancel

internal void Cancel(ICancellableCommand command, int commandId, bool isCancel)
{
	// Bug report comment:
	// Session is not null and open.
	if (m_session is null || State != ConnectionState.Open || !m_session.TryStartCancel(command))
	{
		Log.Info("Ignoring cancellation for closed connection or invalid CommandId {0}", commandId);
		return;
	}

	Log.Info("CommandId {0} for Session{1} has been canceled via {2}.", commandId, m_session.Id, isCancel ? "Cancel()" : "command timeout");

	try
	{
		// open a dedicated connection to the server to kill the active query
		var csb = new MySqlConnectionStringBuilder(m_connectionString);
		csb.Pooling = false;
		if (m_session.IPAddress is not null)
			csb.Server = m_session.IPAddress.ToString();
		var cancellationTimeout = GetConnectionSettings().CancellationTimeout;
		csb.ConnectionTimeout = cancellationTimeout < 1 ? 3u : (uint) cancellationTimeout;

		using var connection = new MySqlConnection(csb.ConnectionString);
		connection.Open();

		// Bug report comment
		// command.Connection!.ServerThread will call Session property where we assert that the state is Open which it is not anymore.
		using var killCommand = new MySqlCommand("KILL QUERY {0}".FormatInvariant(command.Connection!.ServerThread), connection);
		killCommand.CommandTimeout = cancellationTimeout < 1 ? 3 : cancellationTimeout;
		m_session.DoCancel(command, killCommand);
	}
	catch (MySqlException ex)
	{
		// cancelling the query failed; setting the state back to 'Querying' will allow another call to 'Cancel' to try again
		Log.Warn(ex, "Session{0} cancelling CommandId {1} failed", m_session!.Id, command.CommandId);
		m_session.AbortCancel(command);
	}
}
@appel1
Copy link

appel1 commented Jun 15, 2021

Looks like MySqlConnection.Cancel expects only MySqlExceptions but MySqlConnection.Session throws an InvalidOperationException.

Just ran into this as well but instead it causes a dead-lock for us because we attempt to log unhandled exceptions to a mysql db using log4net. So one thread holds a lock in log4net and waits for the lock in TimerQueue.Remove and another thread holds the lock in TimerQueue.Callback and waits for the lock in log4net. At least that is what it looks like in the dump file.

@bgrainger
Copy link
Member

bgrainger commented Jun 16, 2021

@bjornhandersson Do you know what the State is when the InvalidOperationException is thrown?

I don't know why this line accesses command.Connection! instead of just this:

using var killCommand = new MySqlCommand("KILL QUERY {0}".FormatInvariant(command.Connection!.ServerThread), connection);

It seems like we might be able to save the ServerThread at the beginning of the method, which would "solve" this race condition, with the risk of cancelling an unrelated command on a different connection. The State should already be set to CancelingQuery so I'm very curious as to what it's being changed to while in that state. (Ignore: there are two different State variables.)

@bgrainger
Copy link
Member

I'm trying to reproduce this race condition with a stress test that runs lots of DO SLEEP(1); commands and cancels them with CommandTimeout = 1. (With actually a small amount of randomness on the SLEEP timeout.) But I haven't been able to reproduce any race condition with the command/connection being closed while the cancellation is running, even with a distant MySQL Server that should cause this line to take a long time:

Any pointers on how to reproduce this reliably?

@appel1
Copy link

appel1 commented Jun 16, 2021

In my dump files the InvalidOperationException's message is: {"Connection must be Open; current state is Closed"}

   at MySqlConnector.MySqlConnection.get_Session() in /_/src/MySqlConnector/MySqlConnection.cs:line 690
   at MySqlConnector.MySqlConnection.get_ServerThread() in /_/src/MySqlConnector/MySqlConnection.cs:line 489
   at MySqlConnector.MySqlConnection.Cancel(ICancellableCommand command, Int32 commandId, Boolean isCancel) in /_/src/MySqlConnector/MySqlConnection.cs:line 719
   at MySqlConnector.Utilities.TimerQueue.Callback(Object obj) in /_/src/MySqlConnector/Utilities/TimerQueue.cs:line 80
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()

Don't have a way to reproduce yet.

@appel1
Copy link

appel1 commented Jun 16, 2021

Looks like the ServerSesssion for the connection has State = Failed. So it could be that at the same time as the TimerQueue triggered a cancel the server responded with something (or closed the connection) causing the ServerSession.State to be changed to Failed. ServerSesssion.SetFailed also sets the connection state to Closed.

@bjornhandersson
Copy link
Author

Unfortunately I don't know what the State was at the point of the crash. It happens quite rarely only seen it two times on a quite busy production system over the past month. I think @appel1 in this message #1002 (comment) is on to something.
I will try reproduce it in a non-production environment.

Guess there are three potential "fixes"

  1. Resolve the command.Connection!.ServerThread early on and handle the InvalidOperationException as suggested
  2. Simply handle InvalidOperationException along with MySqlException
  3. Resolve the ServerThread directly accessing the field m_session to bypass the State assertion like int serverThread = command.Connection?.m_session?.ConnectionId ?? 0; All other access to the session in Cancel uses the field directly.

@bgrainger
Copy link
Member

bgrainger commented Jun 16, 2021

Guess there are three potential "fixes"

Yes, but each some of those involves deliberately ignoring a broken invariant that's assumed to be true about the code, i.e., that the connection for the command that's being cancelled is Open. If the MySqlConnection isn't Open, then the underlying session may have been returned to the pool and then assigned to a new MySqlConnection. Killing the query on the new connection may have unpredictable consequences.

Edit: Handling InvalidOperationException would stop the unhandled exception without sending a KILL QUERY command to a server thread that might not be expecting it.

@bgrainger
Copy link
Member

I found an interesting lock dependency in the code when I was trying to run my repro:

using var connection = new MySqlConnection(csb.ConnectionString);
connection.Open();
using var command = new MySqlCommand($"DO SLEEP(1.00);", connection) { CommandTimeout = 1 };
command.ExecuteNonQuery();

My hypothesis was that the command could finish executing and dispose the connection before cancellation completes. However, I found that when the command is being cancelled, ExecuteNonQuery doesn't return until cancellation is finished. This is because TimerQueue.Callback takes a lock:

private void Callback(object? obj)
{
lock (m_lock)

But so does TimerQueue.Remove:

public bool Remove(uint id)
{
lock (m_lock)

Which is called from MySqlDataReader.Dispose (via SetTimeout):

internal async ValueTask DisposeAsync(IOBehavior ioBehavior, CancellationToken cancellationToken)
#endif
{
if (!m_closed)
{
m_closed = true;
if (m_resultSet is not null && Command!.Connection!.State == ConnectionState.Open)
{
Command.Connection.Session.SetTimeout(Constants.InfiniteTimeout);

Thus, disposing the data reader can't complete until the command timeout callback is removed from the queue, which can't happen (due to the lock) until the command timeout callback (which cancels the command) completes running. (One could probably simulate this race condition by explicitly calling Cancel, but that wouldn't have the callstack in the OP.)

@bgrainger
Copy link
Member

Shipped a fix in 1.3.10 that catches and ignores the InvalidOperationException. I suspect there's still some (rare) race condition in this code, but this should at least improve stability by not throwing an unhandled exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants