-
Notifications
You must be signed in to change notification settings - Fork 188
Cassandra
Cassandra is a wide-row store NoSQL database.
Cassandra uses partitioners to distribute data across nodes within database nodes. Nodes are organized into hierarchy of data centers contained by clusters.
Cassandra separates applications by key space. This can have an equivalence to GeoWave namespace. A keyspace contains one or more column families.
Cassandra performs best when data needed to satisfy a given query is located in the same partition key.
The default partitioner is the Murmur3Partitioner partitioner. It may look tempting to use the ByteOrderedPartitioner, to align with Hbase or Accumulo behaviors for the types of range scans, fundamental to GeoWave's design. The ByteOrderedPartitioner is not recommended as it presents difficulties in load balancing and hot spotting on writes.
Choosing the correct compaction strategy impacts GeoWave performance. Compaction consolitates SSTables, removing deleted rows and producing single versions of each updated and inserted row. SSTables are organized by sorted partition id. By default, LeveledCompactionStrategy should be used since GeoWave targets read performance. LeveledCompactionStrategy levels the size of SSTables, consolidating into higher level tables of non-overlapping data. Each level increases the permitted table size by a factor of 10. In low write and high read volume applications, typically one higher level SSTable is only processed. In high write volume applications, there are many level 0 SSTables requiring more disk operations. Thus, for streaming applications, the DateTieredCompactionStrategy strategy may be more appropriate, as it compacts based on age of data.
The primary index for a Cassandra table is the row key. Secondary indices in Cassandra use the same definition as GeoWave, maintained as separate B-trees the reference (point to) the data in a table sorted by a primary key.
Cassandra design is akin to a nested sorted map. Row keys reference a sorted map of column key/values. Both are unbounded, supporting wide rows. Columns can be organized into super columns (columns within columns) and composite columns (two keys combined into a single column). Cassandra allows 2 billion columns per row. Column key names should be short, as the name is replicated for each cell. It is permissible to place the value in the key, leaving the column value null.