Cassandra is a partitioned row store, where rows are organized into tables with a required primary key.
The first component of a table’s primary key is the partition key; within a partition, rows are clustered by the remaining columns of the PK. Other columns may be indexed independent of the PK.
This allows pervasive denormalization to "pre-build" resultsets at update time, rather than doing expensive joins across the cluster.
- Stackoverflow
Sample Tables:
CREATE TABLE sensor_readings ( sensorID uuid, time_bucket int, timestamp bigint, reading decimal, PRIMARY KEY ((sensorID, time_bucket), timestamp) ) WITH CLUSTERING ORDER BY (timestamp DESC);
SELECT * FROM sensor_readings WHERE sensorID = 53755080-4676-11e4-916c-0800200c9a66 AND time_bucket IN (1411840800, 1411844400) AND timestamp >= 1411841700 AND timestamp ⇐ 1411845300;
CREATE TABLE IF NOT EXISTS
CREATE TABLE IF NOT EXISTS
Sequential writes can cause hot spots: If the application tends to write or update a sequential block of rows at a time, the writes will not be distributed across the cluster. They all go to one node. This is frequently a problem for applications dealing with timestamped data