Socket failures under sustained load #70

sdether · 2012-10-06T23:02:51Z

Was doing some simple load testing and quickly ran into socket failures of the type : An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full 127.0.0.1:9160 which is wrapped by No connection could be made because all servers have failed.

[Test]
public void Socket_failure() {
    var keyspaceName = "test";
    var connectionBuilder = new ConnectionBuilder("test", Server);
    var id = 1;
    try {
        using(var db = new CassandraContext(connectionBuilder)) {
            if(db.KeyspaceExists(keyspaceName))
                db.DropKeyspace(keyspaceName);
            var keyspace = new CassandraKeyspace(new CassandraKeyspaceSchema {
                Name = keyspaceName,
            }, db);
            keyspace.TryCreateSelf();
            if(!keyspace.ColumnFamilyExists("event")) {
                keyspace.TryCreateColumnFamily(new CassandraColumnFamilySchema {
                    FamilyName = "event",
                    KeyValueType = CassandraType.IntegerType,
                    ColumnNameType = CassandraType.UTF8Type,
                    DefaultColumnValueType = CassandraType.UTF8Type,
                    Columns = { new CassandraColumnSchema() { Name = "Name" } }
                });
            }
            while(true) {
                var evFamily = db.GetColumnFamily("event");
                dynamic ev = evFamily.CreateRecord(new BigInteger(id));
                db.Attach(ev);
                ev.Name = "view " + id;
                db.SaveChanges();
                id++;
            }
        }
    } catch {
        Console.WriteLine("died after {0} ops", id);
        throw;
    }
}

I know this example is degenerate in that it calls save for each new record, but it is supposed to mimic many requests coming into the server, each logging events, and each request would be creating a new context using the connectionbuilder. Originally it was running in multiple threads with multiple contexts, but i tried to reduce it to the simplest repo that still failed.

Output from the above is:

keyspace setup: 1b7fde6c-700a-3bc9-b7d2-3188a1e48fc8
column family setup: 2abf84a8-f8d6-34f7-9f02-646f0478f0cc
connection: System.Net.Sockets.SocketException (0x80004005): An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full 127.0.0.1:9160
   at System.Net.Sockets.TcpClient.Connect(String hostname, Int32 port)
   at Thrift.Transport.TSocket.Open() in C:\github\fluentcassandra\src\Thrift\Transport\TSocket.cs:line 134
   at Thrift.Transport.TFramedTransport.Open() in C:\github\fluentcassandra\src\Thrift\Transport\TFramedTransport.cs:line 53
   at FluentCassandra.Connections.Connection.Open() in C:\github\fluentcassandra\src\Connections\Connection.cs:line 137
   at FluentCassandra.Connections.NormalConnectionProvider.Open() in C:\github\fluentcassandra\src\Connections\NormalConnectionProvider.cs:line 39
connection: localhost:9160,0 secs has been blacklisted
FluentCassandra.CassandraException: No connection could be made because all servers have failed. ---> FluentCassandra.CassandraException: No connection could be made because all servers have failed.
   at FluentCassandra.Connections.NormalConnectionProvider.Open() in C:\github\fluentcassandra\src\Connections\NormalConnectionProvider.cs:line 51
   at FluentCassandra.CassandraSession.GetClient(Boolean setKeyspace, Nullable`1 setCqlVersion) in C:\github\fluentcassandra\src\CassandraSession.cs:line 103
   at FluentCassandra.Operations.BatchMutate.Execute() in C:\github\fluentcassandra\src\Operations\BatchMutate.cs:line 53
   at FluentCassandra.Operations.Operation`1.TryExecute(TResult& result) in C:\github\fluentcassandra\src\Operations\Operation.cs:line 24
   --- End of inner exception stack trace ---
died after 16328 ops

The text was updated successfully, but these errors were encountered:

nberardi · 2012-10-06T23:09:44Z

Sounds like an issue with Thrift or System.Net

I don't see how I can fix this on the Fluent Cassandra side of things.

sdether · 2012-10-06T23:13:06Z

Oh, I didn't see a reference to Thrift, so I assumed that the Thrift code in your project was your own implementation

nberardi · 2012-10-06T23:18:33Z

I decided to uncle the Thrift code in FluentCassandra instead of adding a reference. The code was takes from this project which is also what Cassandra uses on the other side of the connection.

http://thrift.apache.org/

I am not totally convinced it is Thrift though. It sounds like you reached the connection limit for Windows by the sound of the error message.

sdether · 2012-10-07T00:03:10Z

Looks like it was my lack of understanding the ConnectionBuilder. By default it doesn't do pooling and CassandraContext creates a new session for every execution, so that it creates a new socket for every execute and sure, that'll deplete Sockets quickly. Once pooling is turned on the single-threaded example works fine, but my multi-threaded one (using 20 workers and a 50ms sleep between ops) now hits a timeout trying to get a connection from the connectionpool, so I may still be misconfiguring something or there is some bottleneck in the connection pool that doesn't release connections back into the pool fast enough. Will investigate further.

nberardi · 2012-10-07T00:12:29Z

Connection pools aren't unlimited. The max amount must at least be the number of threads you spin up. Realistically though the connection pool max should be about 2.5 times the number of threads you have running.

nberardi · 2012-10-07T00:18:40Z

You may be right. I was thinking about shaving down the frame where it
meets the floor to make it more square.

Nick Berardi
(484) 302-0125
Sent on the go from my phone.

On Oct 6, 2012, at 8:03 PM, Arne Claassen notifications@github.com wrote:

Looks like it was my lack of understanding the ConnectionBuilder. By
default it doesn't do pooling and CassandraContext creates a new session
for every execution, so that it creates a new socket for every execute and
sure, that'll deplete Sockets quickly. Once pooling is turned on the
single-threaded example works fine, but my multi-threaded one (using 20
workers and a 50ms sleep between ops) now hits a timeout trying to get a
connection from the connectionpool, so I may still be misconfiguring
something or there is some bottleneck in the connection pool that doesn't
release connections back into the pool fast enough. Will investigate
further.

—
Reply to this email directly or view it on
GitHubhttps://github.com//issues/70#issuecomment-9203680.

nberardi · 2012-10-07T00:25:37Z

I think there is already an open issue around how the connection pool releases the connections back to the pool.

sdether · 2012-10-07T04:48:00Z

Found it: #29

I'll follow up in that thread.

sdether closed this as completed Oct 7, 2012

ghost assigned nberardi Oct 11, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Socket failures under sustained load #70

Socket failures under sustained load #70

sdether commented Oct 6, 2012

nberardi commented Oct 6, 2012

sdether commented Oct 6, 2012

nberardi commented Oct 6, 2012

sdether commented Oct 7, 2012

nberardi commented Oct 7, 2012

nberardi commented Oct 7, 2012

nberardi commented Oct 7, 2012

sdether commented Oct 7, 2012

Socket failures under sustained load #70

Socket failures under sustained load #70

Comments

sdether commented Oct 6, 2012

nberardi commented Oct 6, 2012

sdether commented Oct 6, 2012

nberardi commented Oct 6, 2012

sdether commented Oct 7, 2012

nberardi commented Oct 7, 2012

nberardi commented Oct 7, 2012

nberardi commented Oct 7, 2012

sdether commented Oct 7, 2012