Recovery optimization for persistent channels #4

krasserm · 2014-01-03T18:39:44Z

Durable queues on top of Cassandra are known as anti-pattern. The issues related to reading a large number of tombstones can be addressed by introducing optimizations on persistent channel level and on journal level.

Possible optimizations on persistent channel level are (copied from a code comment in PersistentChannel.scala):

// TODO: avoid scanning over large number of tombstones during recovery
//
// Introduce an optimization to address issues mentioned in
// http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
// (which is also relevant when using a local LevelDB).
//
// This requires that recovery should start from the first
// non-deleted message rather than from sequence number 1.
// This can be achieved by taking empty snapshots (savepoints) 
// to set a recovery starting point. During recovery
//
// - when the first replayed message is received, take its
//   sequence number n and write a savepoint at n - 1.
//   This ensures that the next recovery skips a possible
//   expensive scan over n - 1 messages with a tombstone.
// - when the savepoint has been successfully written, delete
//   all savepoints that are older than n - 1.
//
// Writing a savepoint at lastSequenceNr - 1 requires a direct
// interaction with the snapshot store actor.
//
// Alternative/addition:
//
// The RequestReader actor of a persistent channel could
// also process confirmation messages and compute starting
// points for recovery. At periodic intervals, these starting
// points are then persisted as savepoints.

One optimization on journal level is to delete entire rows if all columns have been marked as deleted (i.e. have a tombstone). A prerequisite is to have row splitting which is already implemented by #1.

Furthermore, reads form the journal with no lower bound (i.e. starting from sequence number 1) are only done during recovery. All other reads (done by View or PersistentChannel) have a lower bound which are fast. Assuming infrequent persistent channel recoveries, Cassandra can be configured in a way that tombstone garbage collection is likely to occur between recoveries.

The text was updated successfully, but these errors were encountered:

krasserm · 2014-05-12T07:36:23Z

Wrong reference from commit.

rkuhn · 2014-05-30T16:31:50Z

@bantonsson: this might interest you as well

krasserm · 2014-07-01T07:16:39Z

Obsolete with deprecation of persistent channels in Akka 2.3.4.

* The "Column family ID mismatch" exception seems to happen when two different sessions create the keyspace at the "same" time and then they create the same tables. The exception is thrown when creating the tables. * This may happen when write and read-side journals are started at the same time * Changed the retry logic to retry not only the connect and the keyspace creation, but the whole initialization including creation of tables * Removed keyspace-autocreate-retries config, connect-retries is enough * Changed connect-retry-delay to 1s * Create of metadata table also from read journal * Also placed the execute of creation of keyspace and tables in synchronized block to reduce the risk of "Column family ID mismatch" This is not important for correctness, but improves the "experience" because the error logging and retries are avoided when running on single machine (dev)

retry keyspace and table creation, krasserm#4

krasserm mentioned this issue Mar 31, 2014

Recovery optimization for persistent channels akka/akka#13803

Closed

krasserm closed this as completed in d4e3915 May 12, 2014

krasserm reopened this May 12, 2014

krasserm closed this as completed Jul 1, 2014

jypma pushed a commit to jypma/akka-persistence-cassandra that referenced this issue Jan 20, 2016

Merge pull request krasserm#14 from akka/wip-4-init-patriknw

00f6cbc

retry keyspace and table creation, krasserm#4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recovery optimization for persistent channels #4

Recovery optimization for persistent channels #4

krasserm commented Jan 3, 2014

krasserm commented May 12, 2014

rkuhn commented May 30, 2014

krasserm commented Jul 1, 2014

Recovery optimization for persistent channels #4

Recovery optimization for persistent channels #4

Comments

krasserm commented Jan 3, 2014

krasserm commented May 12, 2014

rkuhn commented May 30, 2014

krasserm commented Jul 1, 2014