Suggestion - use built in thrift protocol load balancer for connectivity rather than tracking your own #13

redsolar opened this Issue Feb 5, 2010 · 2 comments

2 participants


I noticed you have a pretty sophisticated tracker of up/down hosts for multi-host setups, with active/round robin/random support.

In reality, most will likely want to use true random (seldom r/r and almost never active) approach given the eventually consistent nature of cassandra, rather than anything else. Up to you of course, but it simplifies code maintenance a lot, and improves readability.

I am in the writing a highly optimized, performance oriented read/write cassandra CRUD for our needs, and noticed that thrift supports internal randomized (or r/r, but not "active") load balancing without much ado, and in addition does a very good job with downed host detection using APC as an intermediary.

All you need to do is use a TSocketPool object instead of TSocket during cassandra object initialization.

So in this case, $transport = new TBufferedTransport(new TSocket($host, $port), 1024, 1024); will need to be replaced with $transport = new TBufferedTransport(new TSocketPool($hosts, $port), 1024, 1024); where $hosts is an array of hostnames/IPs. $port can also be an array but in Thrift's case it's expected to be a 1:1 host->port relationship, or a single unified port, so if you have 5 hosts, you can use a single unified port (such as default 9160) or an array of 5 ports, otherwise things may not work as expected.

TSocket/TSocketPool also seems to track open()/isOpen() internally, so it's probably not needed to do that either.

If you desire round/robin approach, you can achieve that using setRandomize(false) method of TSocketPool.

See TSocket.php and TSocketPool.php for more options too.


Thanks for the well thought feedback - it's really appreciated :) In terms of design decisions, the Pandra::getClient() function is a code stub which is to be developed for 0.2, as I see not being able to select a specific node for read/write (ie: active node) in TSocketPool as a fairly significant limitation.

The major issue I had with TSocketPool as it stands is from an apps development perspective, in not being able to guarantee read consistency against a key immediately after it has been written in a random access arrangement (for consistency one or quorum). I love Cassandra's consistency model but can't help think that it's somewhat of a liability where user experience or data dependence between processes is an application requirement.

I marked this as 'APC round robin, named clusters, node auto discovery' in the roadmap for a 0.2 tag - it's not a verbose description of the (pretty big) issue but basically consists of :

  • keyspaces bound to their own connection pools

  • apc and memcache random/round access. You're right though, round is probably too little gain for the effort so might junk that.

  • apc/memcache key > node maps with short/tweakable expiries to guarantee temporal read consistency for a single key for setups without consistency 'all'. Small margin of error here, but hopefully where a key is updated in quick succession any calls for the key against the key/node map will retrieve the most recent version.

  • consolidating the readMode()/writeMode() functions against the key/node map, these modes are currently just stubs

  • node auto discovery and consistency level downgrades

Additionally now after your suggestion, I'll move the pool logic out of core and into a child of TSocket.php with some heavy borrowing of socket pools open() code (TSocketPool itself isn't extensible to this). This will be with the addition of named pools and named hosts for more fine grained control.



No problem.

For our needs, I am writing a simple(r) CRUD class, which is a lot more oriented towards performance, than extendability. It's more of storage bridge, so that developers can pass things back and forth without the need to understand cassandra, the consistency model, and the somewhat confusing data model for a newcomer who has worked mostly with relational data.

When I discovered your class, I like the idea, but that's not what I am after. Having broken a few nails with Thrift/Cassandra interaction so far, I figured it's good to give back to someone taking on the task of writing a more general, rapid development oriented class.

There will be more to come :)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment