Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Data model inversion? #44

Open
amalloy opened this Issue · 2 comments

2 participants

Alan Malloy Michael Pearson
Alan Malloy

It's possible I've got this all wrong, but I've thought about it for a while and I'm reasonably certain I understand Cassandra. In "WTF is a SuperColumn", the data model is described as Keyspace.SuperColumnFamily[Row][Super][Column] = value. Pandra does not have a Row type; instead each SCF has a keyID, which is the Row index (first key in brackets). This means that, in order to add rows to a SuperColumnFamily, we must create a NEW SuperColumnFamily object for every entry, with the same keyID (since Pandra uses that to mean the second member of the dotted pair) but a different name. This is backward: there should be a Row class or some such, which does what SCF does now (hold SuperColumns), and the SuperColumnFamily class should be repurposed to be solely a Map.

More than just a naming issue, this implementation has technical implications. Specifically, in PandraSuperColumnFamily::save(), there is a comment /* @todo there must be a better way */, followed by looping over all of the SuperColumn children. There is a better way! The Thrift method batch_mutate takes a keyspace, and a map>>. Mutation, meanwhile, can describe a SuperColumn insertion, which itself is a list of Column insertions. Pandra is not making use of all of these levels of hierarchy: every save() call in Pandra's API could be implemented as a single Thrift call, with no need for multiple requests.

My rough sketch of an implementation would be:

class SCF {
  function save() {
    $mutations = array();
    foreach ($this->getRows() as $key => $superCol) {
      $mutations[$key] = array($superCol->getMutation()); // see below
    }
    $realParam = array($this->name => $mutations); // wrap it up to save just this SCF
    $client->batch_mutate($this->keyspace, $realParam);
  }
}
class SuperColumn {
  function getMutation() {
    $cols = array();
    foreach ($this->getColumns() as $name => $value) {
      $cols[] = new ThriftColumn($name, $value);
    }
    return new ThriftMutation(INSERT, new ThriftSuperColumn($this->name, $cols));
  }
}

Obviously this glosses over quite a few details, like deletions, but I think the structure is right. I definitely sympathize with your erroneous (but see disclaimer at top!) implementation: even when you know exactly what to do it's hard to think about SuperColumnFamilies!

Michael Pearson
Owner

Thanks for the deeper level of thought :) A row (or keyspace) container would be best, it would be great to tie in authentication and connection pooling to core also. I'm fine to do this, but did you have any other specifics in mind?

Alan Malloy

No specifics, particularly. I've just started to get into Pandra's internals, so I don't have enough insight to be helpful there; I really just noticed the weird API because it conflicted with what I knew about Cassandra. I'll let you know if I have any clever ideas while I'm working on 0.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.