Skip to content
This repository has been archived by the owner on May 25, 2021. It is now read-only.

Socket failures under sustained load #70

Closed
sdether opened this issue Oct 6, 2012 · 8 comments
Closed

Socket failures under sustained load #70

sdether opened this issue Oct 6, 2012 · 8 comments
Assignees

Comments

@sdether
Copy link
Contributor

sdether commented Oct 6, 2012

Was doing some simple load testing and quickly ran into socket failures of the type : An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full 127.0.0.1:9160 which is wrapped by No connection could be made because all servers have failed.

[Test]
public void Socket_failure() {
    var keyspaceName = "test";
    var connectionBuilder = new ConnectionBuilder("test", Server);
    var id = 1;
    try {
        using(var db = new CassandraContext(connectionBuilder)) {
            if(db.KeyspaceExists(keyspaceName))
                db.DropKeyspace(keyspaceName);
            var keyspace = new CassandraKeyspace(new CassandraKeyspaceSchema {
                Name = keyspaceName,
            }, db);
            keyspace.TryCreateSelf();
            if(!keyspace.ColumnFamilyExists("event")) {
                keyspace.TryCreateColumnFamily(new CassandraColumnFamilySchema {
                    FamilyName = "event",
                    KeyValueType = CassandraType.IntegerType,
                    ColumnNameType = CassandraType.UTF8Type,
                    DefaultColumnValueType = CassandraType.UTF8Type,
                    Columns = { new CassandraColumnSchema() { Name = "Name" } }
                });
            }
            while(true) {
                var evFamily = db.GetColumnFamily("event");
                dynamic ev = evFamily.CreateRecord(new BigInteger(id));
                db.Attach(ev);
                ev.Name = "view " + id;
                db.SaveChanges();
                id++;
            }
        }
    } catch {
        Console.WriteLine("died after {0} ops", id);
        throw;
    }
}

I know this example is degenerate in that it calls save for each new record, but it is supposed to mimic many requests coming into the server, each logging events, and each request would be creating a new context using the connectionbuilder. Originally it was running in multiple threads with multiple contexts, but i tried to reduce it to the simplest repo that still failed.

Output from the above is:

keyspace setup: 1b7fde6c-700a-3bc9-b7d2-3188a1e48fc8
column family setup: 2abf84a8-f8d6-34f7-9f02-646f0478f0cc
connection: System.Net.Sockets.SocketException (0x80004005): An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full 127.0.0.1:9160
   at System.Net.Sockets.TcpClient.Connect(String hostname, Int32 port)
   at Thrift.Transport.TSocket.Open() in C:\github\fluentcassandra\src\Thrift\Transport\TSocket.cs:line 134
   at Thrift.Transport.TFramedTransport.Open() in C:\github\fluentcassandra\src\Thrift\Transport\TFramedTransport.cs:line 53
   at FluentCassandra.Connections.Connection.Open() in C:\github\fluentcassandra\src\Connections\Connection.cs:line 137
   at FluentCassandra.Connections.NormalConnectionProvider.Open() in C:\github\fluentcassandra\src\Connections\NormalConnectionProvider.cs:line 39
connection: localhost:9160,0 secs has been blacklisted
FluentCassandra.CassandraException: No connection could be made because all servers have failed. ---> FluentCassandra.CassandraException: No connection could be made because all servers have failed.
   at FluentCassandra.Connections.NormalConnectionProvider.Open() in C:\github\fluentcassandra\src\Connections\NormalConnectionProvider.cs:line 51
   at FluentCassandra.CassandraSession.GetClient(Boolean setKeyspace, Nullable`1 setCqlVersion) in C:\github\fluentcassandra\src\CassandraSession.cs:line 103
   at FluentCassandra.Operations.BatchMutate.Execute() in C:\github\fluentcassandra\src\Operations\BatchMutate.cs:line 53
   at FluentCassandra.Operations.Operation`1.TryExecute(TResult& result) in C:\github\fluentcassandra\src\Operations\Operation.cs:line 24
   --- End of inner exception stack trace ---
died after 16328 ops
@nberardi
Copy link
Contributor

nberardi commented Oct 6, 2012

Sounds like an issue with Thrift or System.Net

I don't see how I can fix this on the Fluent Cassandra side of things.

@sdether
Copy link
Contributor Author

sdether commented Oct 6, 2012

Oh, I didn't see a reference to Thrift, so I assumed that the Thrift code in your project was your own implementation

@nberardi
Copy link
Contributor

nberardi commented Oct 6, 2012

I decided to uncle the Thrift code in FluentCassandra instead of adding a reference. The code was takes from this project which is also what Cassandra uses on the other side of the connection.

http://thrift.apache.org/

I am not totally convinced it is Thrift though. It sounds like you reached the connection limit for Windows by the sound of the error message.

@sdether
Copy link
Contributor Author

sdether commented Oct 7, 2012

Looks like it was my lack of understanding the ConnectionBuilder. By default it doesn't do pooling and CassandraContext creates a new session for every execution, so that it creates a new socket for every execute and sure, that'll deplete Sockets quickly. Once pooling is turned on the single-threaded example works fine, but my multi-threaded one (using 20 workers and a 50ms sleep between ops) now hits a timeout trying to get a connection from the connectionpool, so I may still be misconfiguring something or there is some bottleneck in the connection pool that doesn't release connections back into the pool fast enough. Will investigate further.

@nberardi
Copy link
Contributor

nberardi commented Oct 7, 2012

Connection pools aren't unlimited. The max amount must at least be the number of threads you spin up. Realistically though the connection pool max should be about 2.5 times the number of threads you have running.

@nberardi
Copy link
Contributor

nberardi commented Oct 7, 2012

You may be right. I was thinking about shaving down the frame where it
meets the floor to make it more square.

Nick Berardi
(484) 302-0125
Sent on the go from my phone.

On Oct 6, 2012, at 8:03 PM, Arne Claassen notifications@github.com wrote:

Looks like it was my lack of understanding the ConnectionBuilder. By
default it doesn't do pooling and CassandraContext creates a new session
for every execution, so that it creates a new socket for every execute and
sure, that'll deplete Sockets quickly. Once pooling is turned on the
single-threaded example works fine, but my multi-threaded one (using 20
workers and a 50ms sleep between ops) now hits a timeout trying to get a
connection from the connectionpool, so I may still be misconfiguring
something or there is some bottleneck in the connection pool that doesn't
release connections back into the pool fast enough. Will investigate
further.


Reply to this email directly or view it on
GitHubhttps://github.com//issues/70#issuecomment-9203680.

@nberardi
Copy link
Contributor

nberardi commented Oct 7, 2012

I think there is already an open issue around how the connection pool releases the connections back to the pool.

@sdether
Copy link
Contributor Author

sdether commented Oct 7, 2012

Found it: #29

I'll follow up in that thread.

@sdether sdether closed this as completed Oct 7, 2012
@ghost ghost assigned nberardi Oct 11, 2012
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants