Data model inversion? #44

amalloy opened this Issue Sep 28, 2010 · 2 comments


None yet
2 participants

amalloy commented Sep 28, 2010

It's possible I've got this all wrong, but I've thought about it for a while and I'm reasonably certain I understand Cassandra. In "WTF is a SuperColumn", the data model is described as Keyspace.SuperColumnFamily[Row][Super][Column] = value. Pandra does not have a Row type; instead each SCF has a keyID, which is the Row index (first key in brackets). This means that, in order to add rows to a SuperColumnFamily, we must create a NEW SuperColumnFamily object for every entry, with the same keyID (since Pandra uses that to mean the second member of the dotted pair) but a different name. This is backward: there should be a Row class or some such, which does what SCF does now (hold SuperColumns), and the SuperColumnFamily class should be repurposed to be solely a Map<String, Row>.

More than just a naming issue, this implementation has technical implications. Specifically, in PandraSuperColumnFamily::save(), there is a comment /* @todo there must be a better way */, followed by looping over all of the SuperColumn children. There is a better way! The Thrift method batch_mutate takes a keyspace, and a map<string, map<string, list>>. Mutation, meanwhile, can describe a SuperColumn insertion, which itself is a list of Column insertions. Pandra is not making use of all of these levels of hierarchy: every save() call in Pandra's API could be implemented as a single Thrift call, with no need for multiple requests.

My rough sketch of an implementation would be:

class SCF {
  function save() {
    $mutations = array();
    foreach ($this->getRows() as $key => $superCol) {
      $mutations[$key] = array($superCol->getMutation()); // see below
    $realParam = array($this->name => $mutations); // wrap it up to save just this SCF
    $client->batch_mutate($this->keyspace, $realParam);
class SuperColumn {
  function getMutation() {
    $cols = array();
    foreach ($this->getColumns() as $name => $value) {
      $cols[] = new ThriftColumn($name, $value);
    return new ThriftMutation(INSERT, new ThriftSuperColumn($this->name, $cols));

Obviously this glosses over quite a few details, like deletions, but I think the structure is right. I definitely sympathize with your erroneous (but see disclaimer at top!) implementation: even when you know exactly what to do it's hard to think about SuperColumnFamilies!


mjpearson commented Oct 7, 2010

Thanks for the deeper level of thought :) A row (or keyspace) container would be best, it would be great to tie in authentication and connection pooling to core also. I'm fine to do this, but did you have any other specifics in mind?

amalloy commented Oct 7, 2010

No specifics, particularly. I've just started to get into Pandra's internals, so I don't have enough insight to be helpful there; I really just noticed the weird API because it conflicted with what I knew about Cassandra. I'll let you know if I have any clever ideas while I'm working on 0.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment