New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate document about consistency and persistence guarantees #770
Comments
This page will supersede (or enhance) the FAQ entry written for #725 |
It would also be good to have a brief paragraph on changefeed consistency guarantees (currently they always behave like up to date reads without the "consistent read" flag). |
This document pertains to RethinkDB 2.1. Consistency and durability settingsThere are three settings that control consistency and durability: Write acks
The possible values are:
Durability
The possible values are:
Read mode
The possible values are:
Changefeeds ignore the GuaranteesLinearizability and atomicityIf Warning: The above linearizability guarantee is for atomic operations, not for queries. A single RethinkDB query will not necessarily execute as a single atomic operation. For example, If you need to read and then modify a document as a single atomic operation, use the
This can also be used to implement a check-and-set register. For example, the following query will atomically check whether to table.get(register_id).update({
foo: r.branch(r.row("foo").eq(old_value), new_value, r.row("foo"))
}) RethinkDB operations are never atomic across multiple keys. For this reason, RethinkDB is not considered an ACID database. Currently, table.filter({id: register_id, foo: old_val}).update({foo: new_val}) However, there has been some discussion of changing this behavior. See rethinkdb/rethinkdb#3992. AvailabilityExcept for brief periods, a table will remain fully available as long as more than half of the replicas for each shard are available, not counting non-voting replicas. If half or more of the voting replicas for a shard are lost, then read or write operations on that shard will fail. If the primary replica is lost, but more than half of the voting replicas are still available, an arbitrary voting replica will be elected as primary. The new primary will show up in Reconfiguring a table (changing the number of shards, shard boundaries, etc.) will cause brief losses of availability at various points during the reconfiguration. If half or more of the voting replicas of a shard are lost, the only way to recover availability is to run the emergency repair command. Running the emergency repair command invalidates the linearizability guarantees in this document. But see rethinkdb/rethinkdb#4357 for an exception / known bug in these availability guarantees. Reads run in Trading off safety for performanceRethinkDB offers a sliding scale of safety versus performance guarantees. The default settings always choose safety over performance except in one case: In normal operation, The same is true for Note that Reads run in Other notesIf you run the emergency repair command on a table, these guarantees will be invalidated. There are two ways a write operation can fail. Sometimes it will fail definitively; other times it will fail indeterminately. You can examine the error message to see which type of failure happened. (This is a work in progress; ask @mlucy for details.) If a write fails definitively, no read will ever see it, even in the weaker read modes. If it fails indeterminately, it's in a sort of a limbo state; reads run in @VeXocide @danielmewes -- please read over this and let me know if you spot any errors or things that are incompletely documented. |
Minor correction: The operation in |
Note that this will either not happen for 2.1, or it will happen in a way which we wouldn't want to document. So as far as the 2.1 documentation is concerned, those operations are definitely not guaranteed to be atomic as a whole (and it's unclear if that will change). |
Note that even |
Also: Very nice writeup 👍 |
If a "majority" read ever sees a write, then the write will never be rolled back. I don't see how network failure is related to this. |
*Non-transitive network failure |
That's true, but such a write might still have returned an indeterminate failure result. Hence a write that fails indeterminately might not only be seen by |
OK, I think we're on the same page here. |
One more thing that confused me while playing with our Raft version, and that I think is worth mentioning: |
Closed in 3d59aa6 |
Nice document. Just a few small things I'd like to change:
I think this is confusing, because it's not clear what being "atomic" means with respect to these operations. I think we should remove that remark.
This is not entirely true. In addition to more than half of the voting replicas for each shard, we also need to have more than half of the voting replicas of the table overall available.
Since we don't expose shard boundaries directly and don't allow adjusting them explicitly, I think this should say: "(changing the number of shards, rebalancing, etc.)" |
For RethinkDB 2.1, we should write a separate document that explains all the consistency and persistence guarantees and the different settings and their effects.
For example: If I perform a write with settings X, is the write guaranteed to be persistent if...
a) one or multiple replicas are temporarily shut down (e.g. maintenance, connectivity issue)
b) one or multiple replicas fail temporarily without an orderly shutdown (e.g. power failure, software crash)
c) one or multiple replicas fail permanently (e.g. storage failure)
Relevant settings are: soft durability vs. hard durability, majority vs. single acks
We should also specify how many replicas are allowed to fail in a given configuration.
Also: If I perform a read with settings X, am I guaranteed to see all previously acknowledged writes? Can I read a value that might get lost later in case a replica fails (e.g. a write that hasn't been acknowledged yet, or that hasn't been written to disk yet)?
Relevant settings are: use_outdated, the upcoming "consistent read" parameter (rethinkdb/rethinkdb#3895). It also matters which settings the relevant previous writes have been performed with.
The text was updated successfully, but these errors were encountered: