Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery optimization for persistent channels #4

Closed
krasserm opened this issue Jan 3, 2014 · 3 comments
Closed

Recovery optimization for persistent channels #4

krasserm opened this issue Jan 3, 2014 · 3 comments

Comments

@krasserm
Copy link
Owner

krasserm commented Jan 3, 2014

Durable queues on top of Cassandra are known as anti-pattern. The issues related to reading a large number of tombstones can be addressed by introducing optimizations on persistent channel level and on journal level.

Possible optimizations on persistent channel level are (copied from a code comment in PersistentChannel.scala):

// TODO: avoid scanning over large number of tombstones during recovery
//
// Introduce an optimization to address issues mentioned in
// http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
// (which is also relevant when using a local LevelDB).
//
// This requires that recovery should start from the first
// non-deleted message rather than from sequence number 1.
// This can be achieved by taking empty snapshots (savepoints) 
// to set a recovery starting point. During recovery
//
// - when the first replayed message is received, take its
//   sequence number n and write a savepoint at n - 1.
//   This ensures that the next recovery skips a possible
//   expensive scan over n - 1 messages with a tombstone.
// - when the savepoint has been successfully written, delete
//   all savepoints that are older than n - 1.
//
// Writing a savepoint at lastSequenceNr - 1 requires a direct
// interaction with the snapshot store actor.
//
// Alternative/addition:
//
// The RequestReader actor of a persistent channel could
// also process confirmation messages and compute starting
// points for recovery. At periodic intervals, these starting
// points are then persisted as savepoints.

One optimization on journal level is to delete entire rows if all columns have been marked as deleted (i.e. have a tombstone). A prerequisite is to have row splitting which is already implemented by #1.

Furthermore, reads form the journal with no lower bound (i.e. starting from sequence number 1) are only done during recovery. All other reads (done by View or PersistentChannel) have a lower bound which are fast. Assuming infrequent persistent channel recoveries, Cassandra can be configured in a way that tombstone garbage collection is likely to occur between recoveries.

@krasserm
Copy link
Owner Author

Wrong reference from commit.

@krasserm krasserm reopened this May 12, 2014
@rkuhn
Copy link

rkuhn commented May 30, 2014

@bantonsson: this might interest you as well

@krasserm
Copy link
Owner Author

krasserm commented Jul 1, 2014

Obsolete with deprecation of persistent channels in Akka 2.3.4.

@krasserm krasserm closed this as completed Jul 1, 2014
jypma pushed a commit to jypma/akka-persistence-cassandra that referenced this issue Jan 20, 2016
* The "Column family ID mismatch" exception
  seems to happen when two different sessions
  create the keyspace at the "same" time and then
  they create the same tables. The exception is thrown
  when creating the tables.
* This may happen when write and read-side journals are
  started at the same time
* Changed the retry logic to retry not only the connect and
  the keyspace creation, but the whole initialization including
  creation of tables
* Removed keyspace-autocreate-retries config, connect-retries
  is enough
* Changed connect-retry-delay to 1s
* Create of metadata table also from read journal
* Also placed the execute of creation of keyspace and tables in
  synchronized block to reduce the risk of "Column family ID mismatch"
  This is not important for correctness, but improves the "experience"
  because the error logging and retries are avoided when running on
  single machine (dev)
jypma pushed a commit to jypma/akka-persistence-cassandra that referenced this issue Jan 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants