Skip to content

Possible "zombie" connections in high-contention scenarios with Cancellation #469

@marcrocny

Description

@marcrocny

I'm building a service that works with a high load of database operations (ASP.NET Core 2.0 Web API). The requests can be large and simultaneous. I'm using TPL Dataflow as a backbone for the processing logic.

In a situation where cancellations start to occur, either because of connection timeouts or because a request is aborted (the RequestAborted CancellationToken is passed to all the underlying tasks) ServerSessions appear to get "lost" after being returned to the ConnectionPool. Eventually a stream of "Too Many Connections" errors comes back from the server.

Once the dust settles (all concurrent requests finish one way or another) the idle sessions are cleaned up--but there will still be many open, sleeping connections in the pool. These are never cleaned up until the process itself recycles, rendering the server itself useless. Sometimes it results in the dreaded "host blocked because of many errors; run flush_hosts". The last I hear from a zombie'd ServerSession is a "Pool(n) receiving Session... back".

Some things remaining for me to check:

  • Is this really not my fault? I don't even handle disposables directly, everything is either handled via dependency injection or wrapped in using() {...}.
  • Like, really not my fault?
  • Is this somehow reproducible as a test case?
  • Will a GC clean them out?
  • Are there remaining references?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions