Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible "zombie" connections in high-contention scenarios with Cancellation #469

Closed
marcrocny opened this issue Mar 27, 2018 · 3 comments
Closed

Comments

@marcrocny
Copy link
Contributor

marcrocny commented Mar 27, 2018

I'm building a service that works with a high load of database operations (ASP.NET Core 2.0 Web API). The requests can be large and simultaneous. I'm using TPL Dataflow as a backbone for the processing logic.

In a situation where cancellations start to occur, either because of connection timeouts or because a request is aborted (the RequestAborted CancellationToken is passed to all the underlying tasks) ServerSessions appear to get "lost" after being returned to the ConnectionPool. Eventually a stream of "Too Many Connections" errors comes back from the server.

Once the dust settles (all concurrent requests finish one way or another) the idle sessions are cleaned up--but there will still be many open, sleeping connections in the pool. These are never cleaned up until the process itself recycles, rendering the server itself useless. Sometimes it results in the dreaded "host blocked because of many errors; run flush_hosts". The last I hear from a zombie'd ServerSession is a "Pool(n) receiving Session... back".

Some things remaining for me to check:

  • Is this really not my fault? I don't even handle disposables directly, everything is either handled via dependency injection or wrapped in using() {...}.
  • Like, really not my fault?
  • Is this somehow reproducible as a test case?
  • Will a GC clean them out?
  • Are there remaining references?
@bgrainger
Copy link
Member

bgrainger commented Mar 28, 2018

ServerSessions appear to get "lost" after being returned to the ConnectionPool

I can (sometimes, not reliably) reproduce what I think you're talking about by running this code thousands of times in parallel:

using (var connection = new MySqlConnection(connectionString))
{
	await connection.OpenAsync();
	using (var cmd = connection.CreateCommand())
	{
		cmd.CommandText = "SELECT SLEEP(5);";
		await cmd.ExecuteScalarAsync();
	}
}	

The first time I run it, most of the connections time out with a Connect Timeout expired exception. But after all the tasks complete, there can be a number of open connections that aren't known about by the pool. If these accumulate, then the Too many connections error happens.

I can't seem to reproduce it when logging is enabled, which may indicate some kind of race condition or concurrency problem (that's avoided by adding the overhead of logging).

This needs further investigation, but it's looking like a pooling bug in MySqlConnector.

@bgrainger
Copy link
Member

bgrainger commented Mar 28, 2018

I've identified (one) problem: if it is taking a long time to retrieve a connection from the pool (e.g., high contention), the CancellationToken that is passed to ConnectionPool.GetSessionAsync may cancel during TryResetConnectionAsync. When this happens, the Session may be in a valid state but won't be returned to the caller nor stored in the pool, so it will leak. Meanwhile, the catch block will release the SemaphoreSlim, letting another thread in, which will create a new connection (which can cause the Too many connections error).

@bgrainger
Copy link
Member

bgrainger commented Mar 28, 2018

Fixed in 0.38.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants