Cassandra Data Modeling

Introduction:

Cassandra is a partitioned row store, where rows are organized into tables with a required primary key.

The first component of a table’s primary key is the partition key; within a partition, rows are clustered by the remaining columns of the PK. Other columns may be indexed independent of the PK.

This allows pervasive denormalization to "pre-build" resultsets at update time, rather than doing expensive joins across the cluster.

Basic Rules of Cassandra Data Modeling

Introduction To Apache Cassandra with Patrick McFadin

Tech Talk: Cassandra Data Modeling TimeSeries with Patrick McFadin

Introduction to Cassandra Data Model | Edureka

Midwest.io 2014 - Time Series with Apache Cassandra - Patrick McFadin

cassandra data modeling - Practical considerations @ netflix

https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/cassandra/schema/v001.cql.tmpl

Stackoverflow: Cassandra UUID vs TimeUUID benefits and disadvantages?

Sample Tables:

CREATE TABLE sensor_readings ( sensorID uuid, time_bucket int, timestamp bigint, reading decimal, PRIMARY KEY ((sensorID, time_bucket), timestamp) ) WITH CLUSTERING ORDER BY (timestamp DESC);

SELECT * FROM sensor_readings WHERE sensorID = 53755080-4676-11e4-916c-0800200c9a66 AND time_bucket IN (1411840800, 1411844400) AND timestamp >= 1411841700 AND timestamp ⇐ 1411845300;

CREATE TABLE IF NOT EXISTS ${keyspace}.traces ( trace_id blob, span_id bigint, span_hash bigint, parent_id bigint, operation_name text, flags int, start_time bigint, duration bigint, tags list<frozen<keyvalue>>, logs list<frozen<log>>, refs list<frozen<span_ref>>, process frozen<process>, PRIMARY KEY (trace_id, span_id, span_hash) ) WITH compaction = { 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS', 'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy' } AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = ${trace_ttl} AND speculative_retry = 'NONE' AND gc_grace_seconds = 10800; — 3 hours of downtime acceptable on nodes

CREATE TABLE IF NOT EXISTS ${keyspace}.duration_index ( service_name text, // service name operation_name text, // operation name, or blank for queries without span name bucket timestamp, // time bucket, - the start_time of the given span rounded to an hour duration bigint, // span duration, in microseconds start_time bigint, trace_id blob, PRIMARY KEY ((service_name, operation_name, bucket), duration, start_time, trace_id) ) WITH CLUSTERING ORDER BY (duration DESC, start_time DESC) AND compaction = { 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS', 'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy' } AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = ${trace_ttl} AND speculative_retry = 'NONE' AND gc_grace_seconds = 10800; — 3 hours of downtime acceptable on nodes

Sequential writes can cause hot spots: If the application tends to write or update a sequential block of rows at a time, the writes will not be distributed across the cluster. They all go to one node. This is frequently a problem for applications dealing with timestamped data

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
books		books
Analyzing Time Series Data with Apache Spark and Cassandra.pdf		Analyzing Time Series Data with Apache Spark and Cassandra.pdf
Apache Cassandra Data Modeling with Travis Price.pdf		Apache Cassandra Data Modeling with Travis Price.pdf
Become a super modeler.pdf		Become a super modeler.pdf
Cassandra Data Modeling Best Practices.pdf		Cassandra Data Modeling Best Practices.pdf
Cassandra Data Modeling.docx		Cassandra Data Modeling.docx
README.adoc		README.adoc
Storing time series data with Apache Cassandra.pdf		Storing time series data with Apache Cassandra.pdf
~$ssandra Data Modeling.docx		~$ssandra Data Modeling.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cassandra Data Modeling

Introduction:

Basic Rules of Cassandra Data Modeling

Introduction To Apache Cassandra with Patrick McFadin

Tech Talk: Cassandra Data Modeling TimeSeries with Patrick McFadin

Introduction to Cassandra Data Model | Edureka

Midwest.io 2014 - Time Series with Apache Cassandra - Patrick McFadin

cassandra data modeling - Practical considerations @ netflix

About

Releases

Packages

sunilsoni/Cassandra-Data-Modeling

Folders and files

Latest commit

History

Repository files navigation

Cassandra Data Modeling

About

Topics

Resources

Stars

Watchers

Forks