Socket failures under sustained load #70
Comments
Sounds like an issue with Thrift or System.Net I don't see how I can fix this on the Fluent Cassandra side of things. |
Oh, I didn't see a reference to Thrift, so I assumed that the Thrift code in your project was your own implementation |
I decided to uncle the Thrift code in FluentCassandra instead of adding a reference. The code was takes from this project which is also what Cassandra uses on the other side of the connection. I am not totally convinced it is Thrift though. It sounds like you reached the connection limit for Windows by the sound of the error message. |
Looks like it was my lack of understanding the ConnectionBuilder. By default it doesn't do pooling and CassandraContext creates a new session for every execution, so that it creates a new socket for every execute and sure, that'll deplete Sockets quickly. Once pooling is turned on the single-threaded example works fine, but my multi-threaded one (using 20 workers and a 50ms sleep between ops) now hits a timeout trying to get a connection from the connectionpool, so I may still be misconfiguring something or there is some bottleneck in the connection pool that doesn't release connections back into the pool fast enough. Will investigate further. |
Connection pools aren't unlimited. The max amount must at least be the number of threads you spin up. Realistically though the connection pool max should be about 2.5 times the number of threads you have running. |
You may be right. I was thinking about shaving down the frame where it Nick Berardi On Oct 6, 2012, at 8:03 PM, Arne Claassen notifications@github.com wrote: Looks like it was my lack of understanding the ConnectionBuilder. By — |
I think there is already an open issue around how the connection pool releases the connections back to the pool. |
Found it: #29 I'll follow up in that thread. |
Was doing some simple load testing and quickly ran into socket failures of the type
: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full 127.0.0.1:9160
which is wrapped byNo connection could be made because all servers have failed.
I know this example is degenerate in that it calls save for each new record, but it is supposed to mimic many requests coming into the server, each logging events, and each request would be creating a new context using the connectionbuilder. Originally it was running in multiple threads with multiple contexts, but i tried to reduce it to the simplest repo that still failed.
Output from the above is:
The text was updated successfully, but these errors were encountered: